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THERAPEUTIC AND DIAGNOSTIC AGENTS CAPABLE OF MODULATING CELLULAR RESPONSIVENESS TO 
CYTOKINES 

FIELD OF THE INVENTION 

5 The present invention relates generally to therapeutic and diagnostic agents. More particularly, 
the present invention provides therapeutic molecules capable of modulating signal transduction 
such as but not limited to cytokine-mediated signal transduction. The molecules of the present 
invention are useful, therefore, in modulating cellular responsiveness to cytokines as well as other 
mediators of signal transduction such as endogenous or exogenous molecules, antigens, microbes 
10 and microbial products, viruses or components thereof, ions, hormones and parasites. 

Bibliographic details of the publications referred to in this specification by author are collected 
at the end of the description. Sequence Identity Numbers (SEQ ID NOs.) for the nucleotide and 
amino acid sequences referred to in the specification are defined after the bibliography. A 
1 5 summary of the SEQ ID NOs is given in Table 1 . 

Throughout this specification and the claims which follow, unless the context requires otherwise, 
the word "comprise", or variations such as "comprises" or "comprising", will be understood to 
imply the inclusion of a stated integer or group of integers but not the exclusion of any other 
20 integer or group of integers. 

BACKGROUND OF THE INVENTION 

Cells continuaUy monitor their environment in order to modulate physiological and biochemical 
25 processes which in turn affects future behaviour. Frequently, a cell's initial interaction with its 
surroundings occurs via receptors expressed on the plasma membrane. Activation of these 
receptors, whether through binding endogenous ligands (such as cytokines) or exogenous ligands 
(such as antigens), triggers a biochemical cascade from the membrane through the cytoplasm to 
the nucleus. 

30 

Of the endogenous ligands, cytokines represent a particularly important and versatile group. 
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Cytokines are proteins which regulate the survival, proliferation, differentiation and function of 
a variety of cells within the body [Nicola, 1994]. The haemopoietic cytokines have in common 
a four-alpha helical bundle structure and the vast majority interact with a structurally related 
family of cell surface receptors, the type I and type II cytokine receptors [Bazan, 1990; Sprang, 
5 1993]. In all cases, ligand-induced receptor aggregation appears to be a critical event in initiating 
intracellular signal transduction cascades. Some cytokines, for example growth hormone, 
erythropoietin (Epo) and granulocyte-colony-stimulating factor (G-CSF), trigger receptor 
homodimerisation, while for other cytokines, receptor heterodimerisation or heterotrimerisation 
is crucial. In the latter cases, several cytokines share common receptor subunits and on this basis 

10 can be grouped into three subfamilies with similar patterns of intracellular activation and similar 
biological effects [Hilton, 1994]. Interleukin-3 (IL-3), IL-5 and granulocyte-macrophage colony- 
stimulating factor (GM-CSF) use the common p-receptor subunit (pc) and each cytokine 
stimulates the production and functional activity of granulocytes and macrophages. IL-2, IL-4, 
IL-7, IL-9, and EL- 15 each use the common y-chain (yc), while IL-4 and IL-13 share an 

15 alternative y-chain (y^c or IL-13 receptor a-chain). Each of these cytokines plays an important 
role in regulating acquired immunity in the lymphoid system. FinaUy, IL-6, IL-1 1, leukaemia 
inhibitory factor (LIP), oncostatin-M (OSM), ciliary neurotrophic factor (CNTF) and 
cardiotrophin (CT) share the receptor subunit gpl30. Each of these cytokines appears to be 
highly pleiotropic, having effects both within and outside the haemopoietic system [Nicola, 
20 1994]. 



In all of the above cases at least one subunit of each receptor complex contains the conserved 
sequence elements, termed boxl and box2, in their cytoplasmic tails [Murakami, 1991]. Boxl 
is a proline-rich motif which is located more proximal to the transmembrane domain than the - 

25 acidic box 2 element. The box-1 region serves as the binding site for a class of cytoplasmic 
tyrosine kinases termed JAKs (Janus kinases). Ligand-induced receptor dimerisation serves to 
increase the catalytic activity of the associated JAKs through cross-phosphorylation. Activated 
JAKs then tyrosine phosphorylate several substrates, including the receptors themselves. 
Specific phosphotyrosine residues on the receptor then serve as docking sites for SH2-containing 

30 proteins, the best characterised of which are the signal transducers and activators of transcription 
(STATs) and the. adaptor protein, she. The STATs are then phosphorylated on tyrosines. 
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probably by JAKs, dissociate from the receptor and form either homodimers or heterodimers 
through the interaction of the SH2 domain of one STAT with the phosphotyrosine residue of the 
other. STAT dimers then translocate to the nucleus where they bind to specific cytokine- 
responsive promoters and activate transcription [Darnell, 1994; Ihle, 1995; Ihle, 1995]. In a 
5 separate pathway, tyrosine phosphorylated she interacts with another SH2 domain-containing 
protein, Grb-2, leading ultimately to activation of members of the MAP kinase family and in turn 
transcription factors such as fos and jun [Sato, 1993; Cutler, 1993]. These pathways are not 
unique to members of the cytokine receptor family since cytokines that bind receptor tyrosine 
kinases also being able to activate STATs and members of the MAP kinase family [David, 1996; 
10 Leaman, 1996; Shual, 1993; Sato, 1993; Cutler, 1993]. 

Four members of the JAK family of cytoplasmic tyrosine kinases have been described, JAKl, 
JAK2, J AK3 and TYK2, each of which binds to a specific subset of cytokine receptor subunits. 
Six STATs have been described (STATl through STAT6), and these too are activated by 
15 distinct cytokine/receptor complexes. For example, STATl appears to be functionally specific 
to the interferon system, STAT4 appears to be specific to n.-12, while STAT6 appears to be 
specific for IL-4 and IL-13. Thus, despite common activation mechanisms some degree of 
cytokine specificity may be achieved through the use of specific JAKs and STATs [Thierfelder, 
1996; Kaplan, 1996; Takeda, 1996; Shimoda, 1996; Meraz, 1996; Durbin, 1996]. 

20 

In addition to those described above, there are clearly other mechanisms of activation of these 
pathways. For example, the JAK/STAT pathway appears to be able to activate MAP kinases 
independent of the she -induced pathway [David, 1995] and the STATs themselves can be 
activated without binding to the receptor, possibly by direct interaction with JAKs [Gupta, 
25 1996]. Conversely, fiill activation of STATS may require the action of MAP kinase in addition 
to that of JAKs [David, 1995; Wen, 1995]. 

While the activation of these signalling pathways is becoming better understood, little is known 
of the regulation of these pathways, including employment of negative or positive feedback 
30 loops. This is important since once a cell has begun to respond to a stimulus, it is critical that 
the intensity and duration of the response is regulated and that signal transduction is switched 
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off. It is likewise desirable to increase the intensity of a response systemically or even locally as 
the situation requires. 

In work leading up to the present invention, the inventors sought to isolate negative regulators 
5 of signal transduction. The inventors have now identified a new family of proteins which are 
capable of acting as regulators of signalling. The new family of proteins is defined as the 
suppressor of cytokine signalling (SOCS) family based on the ability of the initially identified 
SOCS molecules to suppress cytokine-mediated signalling. It should be noted, however, that 
not all members of the SOCS family need necessarily share suppressor function nor target solely 

10 cytokine mediated signalling. The SOCS family comprises at least three classes of protein 
molecules based on amino acid sequence motifs located N-terminal of a C-terminal motif called 
the SOCS box. The identification of this new family of regulatory molecules permits the 
generation of a range of effector or modulator molecules capable of modulating signal 
transduction and, hence, cellular responsiveness to a range of molecules including cytokines. 

15 The present invention, therefore, provides therapeutic and diagnostic agents based on SOCS 
proteins, derivatives, homologues, analogues and mimetics thereof as well as agonists and 
antagonists of SOCS proteins. 

SUMMARY OF THE INVENTION 

20 

The present invention provides inter alia nucleic acid molecules encoding members of the SOCS 
family of proteins as well as the proteins themselves. Reference hereinafter to "SOCS" 
encompasses any or all members of the SOCS family. Specific SOCS molecules are defined 
numerically such as, for example, SOCSl, S0CS2 and SOCS3. The species from which the 

25 SOCS has been obtained may be indicated by a preface of a single letter abbreviation where "h" 
is human, "m" is murine and "r" is rat. Accordingly, "mSOCSl"is a specific SOCS from a murine 
animal. Reference herein to "SOCS" is not to imply that the protein solely suppresses 
cytokine-mediated signal transduction, as the molecule may modulate other effector-mediated 
signal transductions such as by hormones or other endogenous or exogenous molecules, 

30 antigens, microbes and microbial products, viruses or components thereof, ions, hormones and 
parasites. The term "modulates" encompasses up-regulation, down-regulation as well as 
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maintenance of particular levels. 

One aspect of the present invention provides a nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding a protein or a derivative, 
5 homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at 42°C wherein said protein comprises a SOCS box in its C- 
terminal region 

Another aspect of the present invention provides a nucleic acid molecule comprising a sequence 
10 of nucleotides encoding or complementary to a sequence encoding a protein or a derivative, 
homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at 42°C wherein said protein comprises a SOCS box in its C- 
terminal region and a protein:molecule interacting region. 

15 Yet another aspect of the present invention is directed to a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42''C wherein said protein comprises a C- 
. terminal region and a protein:molecule interacting region located in a region N-terminal of the 

20 SOCS box. 

Preferably, the protein: molecule interacting region is a protein:DNA or protein rprotein binding 
region. 

25 Still a further aspect of the present invention provides a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42°C wherein said protein comprises a 
SOCS box in its C-terminal region and one or more of an SH2 domain, WD-40 repeats or 

30 ankyrin repeats N-terminal of the SOCS box. 
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Even still a further aspect of the present invention is directed to a nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
protein or a derivative, homologue, analogue or mimetic thereof or a nucleotide sequence 
capable of hybridizing thereto under low stringency conditions at 42°C wherein said protein 



5 comprises a SOCS box in its C-terminal region wherein the SOCS box comprises the amino acid 



sequence: 




■20 



10 



wherein: 



X, is L, I, V, M, A or P; 



Xj is any amino acid residue; 

X3 is P, T or S; 

X4 is L, I, V, M, A or P; 

X5 is any amino acid; 

Xg is any amino acid; 

X7 is L, I, V, M, A, F, YorW; 

Xg is C, TorS; 

X9 is R, K or H; 

X,o is any amino acid; 

Xj, is any amino acid; 

X,2 is L, I, V, M, A or P; 

X,3 is any amino acid; 

X,4 is any amino acid; 

X,5 is any amino acid; 

is L, I, V, M, A, P, G, C, T or S; 
[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 



30 



X,7is L, I, V, M, AorP; 
X18 is any amino acid; 
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X,9 is any amino acid; 
X20 L, I, V, M, A or P; 
X2, is P; 

X22 is L, I, V, M, A, P or G; 
5 X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X24 is L, I, V, M, A or P; 
10 X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; 

X2g is L, I, V, M, A or P; 

15 and a protein:molecule interacting region such as but not limited to one or more of an SH2 
domain, WD-40 repeats and/or ankyrin repeats N-terminal of the SOCS box. 

Another aspect of the present invention is directed to a nucleic acid molecule comprising a 
. sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
20 derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42°C wherein said protein exhibits the 
following characteristics: 

(i) comprises a SOCS box in its C-terminal region having the amino acid sequence: 

25 X] X2 Xj X4 X5 Xg X7 Xg X9 X|o X|| X,2 X,3 X,4X|5 X,6 [XJ^ X,, X|g X,9 X20 

X2, X22 X23 [Xj]„ X24 X25 X26 XjtXjs 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 
30 X3 is P, T or S; 

X4 is L, I, V, M, A or P; 
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X,i 


s any amino acid; 




s any amino acid; 


X, i 


sLIVMAFYorW- 


Xs i 


s C, T or S; 


Xoi 


s R, K or H; 




is any amino acid; 




is any ammo acid; 


X,2 


is L, I, V, M, A or P; 


X,3 


is any amino acid; 


X,4 


is any amino acid; 




is any amino acid; 




is L, I, V, M, A, P, G, C, T or S; 



[Xi]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xnis L, I, V, M, Aor P; 
X,8 is any amino acid; 
X,9 is any amino acid; 
X20L, I, V, M, Aor P; 
X2, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; 

X28 is L, I, V, M, A or P; and 
comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
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protein:molecule interacting domain in a region N-terminal of the SOCS box. 

Preferably, the SOCS molecules modulate signal transduction such as from a cytokine or 
hormone or other endogenous or exogenous molecule, a microbe or microbial product, an 
5 antigen or a parasite. 

More preferably, the SOCS molecule modulate cytokine mediated signal transduction. 

Still another aspect of the present invention comprises a nucleic acid molecule comprising a 
10 sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or comprises a nucleotide sequence capable 
of hybridizing thereto under low stringency conditions at 42°C wherein said protein exhibits the 
following characteristics; 

(i) is capable of modulating signal transduction; 
15 (ii) comprises a SOCS box in its C-terminal region having the amino acid sequence: 




20 



20 



wherein: 



Xj is L, I, V, M, A or P; 

is any amino acid residue; 
X3 is P. T or S; 
X4 is L, I, V, M, A or P; 
X5 is any amino acid; 



25 



Xg is any amino acid; 

X7 is L, I, V, M,A,F, YorW; 

Xg is C, T or S; 

Xg is R, K or H; 

X,o is any amino acid; 

X,, is any amino acid; 

X,2 is L, I, V, M, A or P; 



30 
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X,3 is any amino acid; 
X,4 is any amino acid; 
X,5 is any amino acid; 
X,6 is L, I, V, M, A, P, G, C, T or S; 
5 [XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence X, may comprise the same or different amino 
acids selected from any amino acid residue; 
Xj^is L, I, V, M, Aor P; 
X|g is any amino acid; 
10 Xi9 is any amino acid; " 

X20L, I, V,M. Aor P; 
X2, is P; 

X22 is L, I, V, M, A, P or G; 

X23 is P or N; 

15 [Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 
20 X26 is any amino acid; 

X27 is Y or F; 

X2g is L, I, V, M, A or P; and 

(iii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
25 protein:molecule interacting domain in a region N-terminal of the SOCS box. 

Preferably, the signal transduction is mediated by a cytoicine such as one or more of EPO, TPO, 
G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, IFNa, TNFa, IL-1 and/or 
M-CSF. 

30 

Preferably, the signal transduction is mediated by one or more of Interleukin 6 (IL-6), Leulcaemia 
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Inhibitory Factor (LBF), Oncostatin M (OSM), Interferon (IFN)-a and/or thrombopoietin. 

Preferably, the signal transduction is mediated by IL-6. 

5 Particularly preferred nucleic acid molecules comprise nucleotide sequences substantially set 
forth in SEQ ED N0;3 (mSOCSl), SEQ ID NO:5 (mS0CS2), SEQ ID NO:7 (mS0CS3), SEQ 
IDN0:9 (hSOCSl), SEQ ID NO: 11 (rSOCSl), SEQ ID NO: 13 (mSOCS4), SEQ ID NO: 15 
and SEQ ID NO: 16 (hS0CS4), SEQ ID NO: 17 (mSOCSS), SEQ ID NO: 19 (hSOCSS), SEQ 
ID NO:20 (mSOCS6), SEQ ID NO:22 and SEQ ID NO:23 (hSOCS6), SEQ ID NO:24 

10 (mSOCS?), SEQ ID NO:26 and SEQ ID NO:27 (hSOCS7), SEQ ID NO:28 (mSOCSS), SEQ 
ID NO:30 (mS0CS9), SEQ ID NO:31 (hS0CS9), SEQ ID NO:32 (mSOCSlO), SEQ ID NO:33 
and SEQ ID NO:34 (hSOCSlO), SEQ ID NO:35 (hSOCSll), SEQ ID NO:37 (mSOCS12), 
SEQ ID NO:38 and SEQ ID NO:39 (hS0CS12), SEQ ID NO:40 (mSOCS13), SEQ ID NO:42 
(hSOCS13), SEQ ID NO: 43 (rnSOCSH), SEQ ID NO:45 (mSOCSlS) and SEQ ID NO:47 

15 (hSOCSlS) or a nucleotide sequence having at least about 15% similarity to all or a region of 
any of the listed sequences or a nucleotide acid molecule capable of hybridizing to any one of the 
listed sequences under low stringency conditions at 42°C. 

Another aspect of the present invention relates to a protein or a derivative, homologue, analogue 
20 or mimetic thereof comprising a SOCS box in its C-terminal region. 

Yet another aspect of the present invention is directed to a protein or a derivative, homologue, 
analogue or mimetic thereof comprising a SOCS box in its C-terminal region and a 
protein:molecule interacting region. 

25 

Even yet another aspect of the present invention provides a protein or a derivative, homologue, 
analogue or mimetic thereof comprising an interacting region located in a region N-terminal of 
the SOCS box. 

30 Preferably, the protein: molecule interacting region is a protein:DNA or a protein:protein binding 
region. 



SUBSTITUTE SHEET (Rule 26) 



wo 98/20023 



PCT/AU97/00729 



- 12- 

Another aspect of the present invention contemplates a protein or a derivative, homologue, 
analogue or mimetic thereof comprising a SOCS box in its C-terminal region and a SH2 domain, 
WD-40 repeats or ankyrin repeats N-terminal of the SOCS box. 



5 Still yet another aspect of the present invention provides a protein or a derivative, homologue, 
analogue or mimetic thereof exhibiting the foUov^'ing characteristics: 



(i) 



comprises a SOCS box in its C-terminal region having the amino acid sequence: 



10 



X. X2 X3 X, X5 X, X, X, X, x,o x„ x,2 x,3 X,,X,5 X,e [XJ„ X„ X,8 X„ X 

X21 X22 X23 [Xj]n X24 X25 X26 X27X2g 



20 



15 



20 



25 



30 



wherein: X, is L, I, V, M, A or P; 

s any amino acid residue; 
s P, TorS ; 
s L, I, V, M, A or P; 
s any amino acid; 
IS any amino acid; 
s L, I, V, M, A, F. Y or W; 
sCTorS; 
;s R, K or H; 
s any amino acid; 
s any amino acid; 
:s L, I, V, M, A or P; 
s any amino acid; 
s any amino acid; 
s any amino acid; 
s L, I, V, M, A, P, G, C.TorS; 
[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 



X 



X4 
Xfi 

Xy 



Xio 

X„ 

X,2 
X,3 

X 



14 



X 
X 



16 
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Xpis L. I, V, M, AorP; 
X,8 is any amino acid; 
X,9 is any amino acid; 
L, I. V, M.AorP; 
5 X2, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
10 acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 
X25 is any amino acid; 
X26 is any amino acid; 
X27 is Y or F; 

15 X28 is L, I, V, M, A or P; and 



(ii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
protein:molecule interacting domain in a region N-terminal of the SOCS box. 

20 Preferably, the proteins modulate signal transduction such as cytokine-mediated signal 
transduction. 



Preferred cytokines are EPO. TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-?, IL- 13, IL-6, LEF, 
IL-12, IFNy, TNFa, IL-1 and/or M-CSF. 

25 

A particularly preferred cytokine is IL-6. 

Even yet another aspect of the present invention provides a protein or derivative, homologue, 
analogue or mimetic thereof exhibiting the following characteristics: 
30 (i) is capable of modulating signal transduction such as cytokine-mediated signal 
transduction; 
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(ii) comprises a SOCS box in its C-terminal region having the amino acid sequence: 

X, Xj X3 X4 X5 Xg X7 Xg X9 X]o Xi, X12 X,3 X,4Xj5 X,6 [XJn Xi7 Xjg Xj9 X20 

Xjt X22 X23 [Xj]„ X24 X25 X26 X27X28 

5 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S ; 

X4 is L, I, V, M, A or P; 
10 X5 is any amino acid; 

Xg is any amino acid; 

X7 is L, I, V, M, A, F, Y or W; 

Xg is C, TorS; 

X9 is R, K or H; 
15 X,o is any amino acid; 

X,j is any amino acid; 

X,2 is L, I, V, M, A or P; 

Xi3 is any amino acid; 

X,4 is any amino acid; 
20 X,5 is any amino acid; 

X,6 is U I, V, M, A, P, G, C, T or S; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X; may comprise the same or different amino 
acids selected from any amino acid residue; 
25 XivisL.I, V,M, AorP; 

X]g is any amino acid; 
X,9 is any amino acid; 
X20 L, I, V, M, A or P; 
X2, is P; 

30 X22 is L, I, V, M, A, P or G; 

XjsisPorN; 



SUBSTITUTE SHEET (Rule 26) 



wo 98/20023 



PCT/AU97/00729 



- 15 - 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X24 is L, I, V, M, A or P; 
5 X25 is any amino acid; 

X26 is any amino acid; 
X27 is Y or F; 

Xjg is L, I, V, M, A or P; and 

10 (iii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
protein- molecule interacting domain in a region N-terminal of the SOCS box. 

Particularly preferred SOCS proteins comprise an amino acid sequence substantially as set forth 
in SEQ ID NO:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID N0:8 (mSOCS3), SEQ ID 
15 NO: 10 (hSOCSl), SEQ ID NO: 12 (rSOCSl), SEQ ID NO: 14 (mSOCS4), SEQ ID NO: 18 
(mSOCSS), SEQ ID N0:21 (mSOCS6), SEQ ID NO:25 (mS0CS7), SEQ ID NO:29 
(mSOCSB), SEQ ID NO:36 (hSOCSll), SEQ ID NO:41 (mSOCS13), SEQ ID NO:44 
(mSOCSH), SEQ ID NO:46 (mSOCS15) and SEQ ID NO:48 (hSOCSlS) or an amino acid 
sequence having at least 15% similarity to all or a region of any one of the listed sequences. 

20 

Another aspect of the present invention contemplates a method of modulating levels of a SOCS 
protein in a cell said method comprising contacting a cell containing a SOCS gene with an 
effective amount of a modulator of SOCS gene expression or SOCS protein activity for a time 
and under conditions sufficient to modulate levels of said SOCS protein. 

25 

A related aspect of the present invention provides a method of modulating signal transduction 
in a cell containing a SOCS gene comprising contacting said cell with an effective amount of a 
modulator of SOCS gene expression or SOCS protein activity for a time sufficient to modulate 
signal transduction. 

30 

Yet a further related aspect of the present invention is directed to a method of influencing 
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interaction between cells wherein at least one cell carries a SOCS gene, said method comprising 
contacting the cell carrying the SOCS gene with an effective amount of a modulator of SOCS 
gene expression or SOCS protein activity for a time sufficient to modulate signal transduction. 

5 In accordance with the present invention, n in [Xi]„ and [Xj]„ may, in addition from being 1-50, 
be from 1-30, 1-20, 1-10 and 1-5. 

A summary of the SEQ ID NOs referred to in the subject specification is given in Table 1. 
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TABLE 1 

SUMMARY OF SEQUENCE IDENTITY NUMBERS 



SEQUENCE 



SEQ ID NO. 



PGR Primer 
PGR Primer 

Mouse SOCSl (nucleotide) 

Mouse SOCS 1 (amino acid) 
10 Mouse SOCS2 (nucleotide) 

Mouse S0CS2 (amino acid) 

Mouse S0CS3 (nucleotide) 

Mouse SOCS3 (amino acid) 

Human SOCSl (nucleotide) 
15 Human SOCSl (amino acid) 

Rat SOCSl (nucleotide) 

Rat SOCSl (amino acid) 

nucleotide sequence of murine S0GS4 

amino acid sequence of murine SOCS4 
20 nucleotide sequence of SOCS4 cDNA human contig 4. 1 

nucleotide sequence of SOCS4 cDNA human contig 4.2 

nucleotide sequence of murine SOCS5 

amino acid sequence of murine SOCS5 

nucleotide sequence of human S0CS5 
25 nucleotide sequence of murine SOCS6 

amino acid of murine SOCS6 

nucleotide sequence of human S0CS6 contig h6.1 

nucleotide sequence of human SOCS6 contig h6.2 

nucleotide sequence of murine S0CS7 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 
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amino acid sequence of murine SOCS7 25 

nucleotide sequence of human SOCS7 contig h7. 1 26 

nucleotide sequence of human S0CS7 contig 17.2 27 

nucleotide sequence of murine SOCS8 28 

5 amino acid sequence of murine SOCS 8 29 

nucleotide sequence of murine S0CS9 30 

nucleotide sequence of human S0CS9 3 1 

nucleotide sequence of murine SOCS 10 32 

nucleotide sequence of human SOCSIO contig hlO.l 33 

10 nucleotide sequence of human SOCSIO contig hlO.2 34 

nucleotide sequence of human SOCS 1 1 35 

amino acid sequence of human SOCS 11 36 

nucleotide sequence of mouse SOCS 12 37 

nucleotide sequence of human SOCS 1 2 contig h 1 2. 1 38 

1 5 nucleotide sequence of human SOCS 1 2 contig h 1 2.2 39 

nucleotide sequence of murine SOCS 1 3 40 

amino acid sequence of murine SOCS 13 41 

nucleotide sequence of human SOCS 1 3 cDNA contig h 1 3 . 1 42 

nucleotide sequence of murine SOCS 14 cDNA 43 

20 amino acid sequence of murine SOCS 14 44 

nucleotide sequence of murine SOCS 1 5 cDNA 45 

amino acid sequence of murine SOCS 15 46 

nucleotide sequence of human SOCS 15 47 

amino acid sequence of human SOCS 15 48 

25 
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Single and three letter abbreviations are used to denote amino acid residues and these are 
summarized in Table 2. 



TABLE 2 



Amino Acid Three-letter One-letter 

Abbreviation Symbol 



Alanine 


Ala 


A 


10 Arginine 


Arg 


R 


Asparagine 


Asn 


N 


Aspartic acid 


Asp 


D 


Cysteine 


Cys 


C 


Glutamine 


Gin 


Q 


1 5 Glutamic acid 


Glu 


E 


Glycine 


Gly 


G 


Histidine 


His 


H 


Isoleucine 


lie 


I 


Leucine 


Leu 


L 


20 Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


Proline 


Pro 


P 


Serine 


Ser 


S 


25 Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


Valine 


Val 


V 


Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE DRAWINGS 

In some of the Figures, abbreviations are used to denote SOCS proteins with certain binding 
motifs. SOCS proteins which contain WD-40 repeats are refened to as WSB1-WSB4. SOCS 
5 proteins with ankyrin repeats are referred to as ASB 1-ASB3. 

Figure 1 is a diagrammatic representation showing generation of an IL-6-unresponsive Ml clone 
by retroviral infection. The RUFneo retrovirus, showing the position of landmark restriction 
endonuclease cleavage sites, the 4A2 cDNA insert and the position of PGR primer sequences. 

10 

Figure 2 is a photographic representation of Southern and Northern analysis. (Left and Middle 
Panels) Southern blot analysis of genomic DNA from clone 4A2 and a control infected Ml clone. 
DNA was digested with BamH I, to reveal the number of retroviruses carried by each clone, and 
Sac I, to estimate the size of the retroviral cDNA insert. Left panel; probed with neo. Right 
15 panel; probed with the Xho Ldigested 4A2 PGR product. (Right Panel) . Northern blot analysis 
of total RNA from clone 4A2 and a control infected Ml clone, probed with the Xho I-digested 
4A2 PCR product. The two bands represent unspliced and spliced retroviral transcripts, 
resulting from splice donor and acceptor sites in the retroviral genome. 

20 Figure 3 is a representation of the nucleotide sequence and structure of the SOCSl gene. A. 
The genomic context of SOCSl in relation to the protamine gene cluster on murine chromosome 
16. The accession number of this locus is MMPRMGNS (direct submission; G. Schlueter, 1995) 
for the mouse and BTPRMTNP2 for the rat (direct submission; G. Schlueter, 1996). B. The 
nucleotide sequence of the SOCSl cDNA and deduced amino acid sequence. Conventional one 

25 letter abbreviations are used for the amino acid sequence and the asterisk indicates the stop 
codon. The polyadenylation signal sequence is underlined. The coding region is shown in 
uppercase and the untranslated region is shown in lower case. 

Figure 4 is a graphical representation of cell differentiation in the presence of cytokines. Semi- 
30 solid agar cultures of parental Ml cells (Ml and Ml .mpl) and Ml cells expressing SOCS 1 (4A2 
and Ml.mpLSOCSl), were used and the percentage of colonies which differentiated in response 
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to a titration of 1 mg/ml IL-6 (•), 100 ng/ml LIF (0), 1 mg/ml OSM (□), 100 ng/ml IFN-y (^), 
500 ng/ml TPO (• ), or 3x 10"^ M dexamethasone (H«) determined. 

Figure 5 is a photographic representation of cytospins of liquid cultures of parental Ml ceils 
5 (Ml and Ml.mpl) ) and Ml cells expressing SOCS 1 (4A2 and Ml .mpl.SOCS 1) cultured for 4 
days in the presence of 10 ng/ml IL-6 or saline. Unlike parental Ml cells, morphological features 
consistent with macrophage differentiation are not observed in Ml cells constitutively expressing 
SOCSl (4A2 and Ml. mpl.SOCS 1) when cultured in IL-6. 

10 Figure 6 is a photographic representation showing inhibition of phosphorylation of signalling 
molecules by SOCSl. Parental Ml cells (Ml and Ml.mpl) and Ml cells expressing SOCSl 
(4A2 and MLmpLSOCSl) were incubated in the absence (-) or presence (+) of 10 ng/ml of IL-6 
for 4 minutes at 37 °C . Cells were then lysed and extracts were either immunopreciptated using 
anti-mouse gpl30 antibody prior to SDS-PAGE (two upper panels) or were electrophoresed 

15 directly (two lower panels). Gels were blotted and the filters were then probed with anti- 
phosphotyrosine (upper panel), anti-gpl30 antibody (second top panel), anti-phospho-STAT3 
(second bottom panel) or anti-STAT3 (lower panel). Blots were visualised using peroxidase- 
conjugated secondary antibodies and Enhanced Chemiluminescence (ECL) reagents. 

20 Figure 7 is a representation of protein extracts prepared from (A) Ml cells or Ml cells 
expressing SOCSl (4A2) and (B) Ml.mpl cells or Ml.mpl.SOCSl cells incubated for 10 min 
at 37°C in 10 ml serum-free DME containing either saline, 100 ng/ml IL-6 or 100 ng/ml EFN-y. 
The binding reactions contained 4-6 pg protein (constant within a given experiment), 5 ng ^^P- 
labelled m67 oligonucleotide encoding the high affinity SEF (c-sis- inducible factor) binding site, 

25 and 800 ng sonicated salmon sperm DNA. For certain experiments, protein samples were 
preincubated with an excess of unlabelled m67 oligonucleotide, or antibodies specific for either 
STATl orSTAT3. 

Figure 8 is a photographic representation of Northern hybridisation. Mice were injected 
30 intravenously with 2 yug and after various periods of time, the livers were removed and polyA-i- 
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mRNA was purified. Ml cells were stimulated for various lengths of time with 500 ng/ml of EL- 
6, after which polyA+ mRNA was isolated. mRNA was fractionated by electrophoresis and 
immobilized on nylon filters. Northern blots were prehybridized, hybridized with random-primed 
^^P-labelled SOCS 1 or GAPDH DNA fragments, washed and exposed to film overnight. 

5 

Figure 9 is a representation of a comparison of the amino acid sequences of SOCS 1 , S0CS2, 
S0CS3 and CIS. Alignment of the predicted amino acid sequence of mouse (mm), human (hs) 
and rat (rr) SOCSl, S0CS2, S0CS3 and CIS. Those residues shaded are conserved in three or 
four mouse SOCS family members. The SH2 domain is boxed in solid lines, while the SOCS box 
10 is bounded by double fines. 

Figure 10 is a photographic representation showing the phenotype of IL-6 unresponsive Ml cell 
clone, 4A2. Colonies of parental Ml cells (left panel) and clone 4A2 (right panel) cultured in 
semi-solid agar for 7 days in saline or 100 ng/ml IL-6. 

15 

Figure 11 is a photographic representation showing expression of mRNA for SOCS family 
members in vitro and in vivo. 

(A) Northern analysis of mRNA from a range of mouse organs showing constitutive 
, expression of SOCS family members in a limited number of tissues. 
20 (B) Norther analysis of mRNA from liver and M 1 cells showing induction of expression of 
SOCS family members following exposure to IL-6. 

(C) Reverse transcriptase PCR analysis of mRNA from bone marrow showing induction of 
expression of SOCS family members by a range of cytokines. 

25 Figure 12 is a photographic representation showing SOCS 1 suppresses the phosphorylation and 
activation of gpl30 and STAT-3. 

(A) Western blots of extracts from parental Ml ceUs (Ml and Ml.mpl) and Ml cells 
expressing SOCSl (4A2 and Ml.mpl.SOCSl) stimulated with (+) or without (-) 100 ng/ml IL-6. 
Top: Extracts immunoprecipitated with antu-gpl30 (agpl30) and immunoblotted with anti- 
30 phosphotyrosine (aPY-STAT3), or for STAT3 (aSTAT3) to demonstrate equal loading of 
protein. The molecular weights of the bands are shown on the right. 
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(B) EMS A of Ml.mpl and Ml.mpl.SOCSl cells stimulated with (+) and without (-) 100 
ng/ml IL-6 or 100 ng/ml IFNy. The DNA-binding complexes SJF A, B, and C are indicated at 
the left. 

5 Figure 13 is a representation of a comparison of the amino acid sequence of the SOCS proteins 
(A) Schematic representation of structures of SOCS proteins including proteins which contain 
WD-40 repeats (WSB) and ankyrin repeats (ASB). (B) Alignment of N-terminal regions of 
SOCS proteins. (C) Alignment of the SH2 domains of CIS, SOCSl, 2, 3, 5, 9, 11 and 14. (D) 
Alignment of the WD-40 repeats of SOCS4, SOCS6, SOCS13 and SOCS15. (E) Alignment of 

10 the ankyrin repeats of S0CS7 and SOCSIO. (F) Alignment of the regions between SH2, WD-40 
and ankyrin repeats and the SOCS box. (G) Alignment of the SOCS box. In each case the 
conventional one letter abbreviations for amino acids are used, with X denoting residues of 
uncertain identity and OOO denoting the beginning and the end of con tigs. Amino acid 
sequence obtained from conceptual translation of nucleic acid sequence derived from isolated 

15 cDNAs is shown in upper case while amino acid sequence obtained by conceptual translation of 
ESTs is shown in lower case and is approximate only. Conserved residues, defined as (LIVMA), 
(FYW), (DE), (QN), (C, S, T), (KRH), (PG) are shaded in the SH2 domain, WD-40 repeats, 
ankyrin repeats and the SOCS box. For the alignment of SH2 domains, WD-40 repeats and 
. ankyrin repeats a consensus sequence is shown above. In each case this has been derived from 

20 examination of a large and diverse set of domains (Neer et al, 1994; Bork, 1993). 

Figures 14(A) and (B) are photographic representations showing analysis of mRNA expression 
of mouse SOCSl and S0CS5 and SOCS containing a WD-40 repeat (WSB2) and ankyrin 
repeats (ASBl). 

25 

Figure 15 is a representation showing the nucleotide sequence of the mouse S0CS4 cDNA. The 
nucleotides encoding the mature coding region from the predicted ATG "start" codon to the stop 
codon is shown in upper case, while the predicted 5' and 3' untranslated regions are shown in 
lower case. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
30 illustrated in Figure 17. 
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Figure 16 is a representation showing the predicted amino acid sequence of the mouse S0CS4 
protein, derived from the nucleotide sequence in Figure 15. The SOCS box, which also shown 
in Figure 13, is underlined. 

5 Figure 18 is a representation showing the nucleotide sequence of human SOCS4 cDNA contigs 
h4.1 and h4.2, derived from analysis of ESTs listed in Table 4.1. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 17. 

Figure 19 is a diagrammatic representation showing the relationship of mouse SOCS5 genomic 
10 (57-2) and cDNA (5-3-2) clones to contigs derived from analysis of mouse ESTs (Table 5.1) and 
human cDNA clone (5-94-2) and ESTs (Table 5.2). The nucleotide sequence of the mouse 
SOCS5 contig is shown in Figure 20, with the sequence of human S0CS5 contig (h5.1) being 
shown in Figure 21. The deduced amino acid sequence of mouse SOCS5 is shown in Figure 
20B. The structure of the protein is shown schematically, with the SH2 domain indicated by 
15 ( ) and the SOCS box by ( ). The putative 5' and 3' translated regions are shown by the thin 
solid line. 

Figure 20A is a representation showing the nucleotide sequence of the mouse SOCS 5 derived 
. from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
20 from the predicted ATG "start" codon to the stop codon is shown in upper case, while the 
predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
cDNA sequence to mouse and human EST contigs is illustrated in Figure 19. 

Figure 20B is a representation of the predicted amino acid sequence of mouse S0CS5 protein, 
25 derived from the nucleotide sequence in Figure 20A. The SOCS box, which also shown in 
Figure 13 is underlined. 

Figure 21 is a representation showing the nucleotide sequence of human SOCS5 cDNA contig 
h5.1, derived from analysis of cDNA clone 5-94-2 and the ESTs listed in Table 5.2. The 
30 relationship of these contigs to the mouse cDNA sequence is illustrated in Figure 19. 
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Figure 22 is a diagrammatic representation showing the relationship of mouse S0CS6 cDNA 
clones (6-1 A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N and 6-5N) to contigs derived from analysis 
of mouse ESTs (Table 6. 1) and human ESTs (Table 6.2). The nucleotide sequence of the mouse 
SOCS-6 contig is shown in Figure 23, with the sequence of human S0CS6 contigs (h6.1 and 
5 h6.2) being shown in Figure 24. The deduced amino acid sequence of mouse SOCS6 is shown 
in Figure 23B. The structure of the protein is shown schematically, while the WD-40 repeats 
indicated by ( ) and the SOCS box by ( ). The putative 5' and 3' untranslated regions are 
shown by the thin solid line. 

10 Figure 23A is a representation showing the nucleotide sequence of the mouse SOCS6 derived 
from analysis of cDNA clone 64-1 OA- 11. The nucleotides encoding the part of the predicted 
coding region, ending in the stop codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 22. 

15 

Figure 23B is a representation showing the predicted amino acid sequence of mouse S0CS6 
protein, derived from the nucleotide sequence in Figure 23 A. The SOCS box, which also shown 
in Figure 1 3 is underlined. 

20 Figure 24 is a representation showing the nucleotide sequence of human S0CS6 cDNA contig 
h6.1, derived from analysis of cDNA clone 5-94-2 and the ESTs listed in Table 6.2. The 
relationship of these contigs to the mouse cDNA sequence is illustrated in Figure 22 

Figure 25.is a diagrammatic representation showing the relationship of mouse SOCS7 cDNA 
25 clone (74-lOA-l 1) to contigs derived from analysis of mouse ESTs (Table 7.1) and human ESTs 
(Table 7.2). The nucleotide sequence of the mouse S0CS7 contig is shown in Figure 26 with 
the sequence of human S0CS7 contigs (h7.1 and h7.2) being shown in Figure 27. The deduced 
amino acid sequence of mouse SOCS7 is shown in Figure 26B. The structure of the protein is 
shown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box by ( ). The 
30 putative 5' and 3' untranslated regions are shown by the thin solid line in the mouse and by the 
wavy line in h7.2. Based on analysis of clones isolated to date and ESTs the 3 ' untranslated 
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regions of mS0CS7 and hSOCSV share little similarity. 

Figure 26A is a representation showing the nucleotide sequence of the mouse S0CS7 derived 
from analysis of cDNA clone 74-lOA-l 1. The nucleotides encoding the part of the predicted 
5 coding region, ending in the stop codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 25. 

Figure 26B is a representation showing the predicted amino acid sequence of mouse S0CS7 
10 protein, derived from the nucleotide sequence in Figure 26A. The SOCS box, which also shown 
in Figure 13 is underlined. 

Figure 27 is a representation showing the nucleotide sequence of human SOCS7 cDNA contig 
h7.1 and h7.2 derived from analysis of the ESTs Usted in Table 7.2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 25. 

Figure 28 is a diagrammatic representation of the relationship of sequence derived from analysis 
of mouse SOCS8 ESTs (Table 8.1 and Figure 29 A) to the predicted protein structure of mouse 
SOCS8. The deduced partial amino acid sequence of mouse SOCS8 is shown in Figure 29B. 
20 The structure of the protein is shown schematically with the SOCS box highlighted ( ). The 
predicted 3' untranslated region is shown by the thin line. 

Figure 29A is a representation showing the partial nucleotide sequence of mouse SOCS8 cDNA 
(contig 8.1) derived from analysis of ESTs. The nucleotides encoding the part of the predicted 
25 coding region, ending in the STOP codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. 

Figure 29B is a representation showing the partial predicted amino acid sequence of the mouse 
SOCS8 protein, derived from the nucleotide sequence in Figure 29A. The SOCS box, which 
30 also shown in Figure 13 is underlined. 
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Figure 30 is a diagrammatic representation showing the relationship of mouse S0CS9 ESTs 
(Table 9.1) and human S0CS9 ESTs (Table 9.2). The nucleotide sequence of the mouse S0CS9 
contig (m9.1) is shown in Figure 31, with the sequence of human SOCS9 contig (h9.1) being 
shown in Figure 32. The deduced amino acid sequence of human SOCS9 is shown 
5 schematically, with the SH2 domain indicated by ( ) and the SOCS box by ( ). The putative 3 ' 
untranslated region is shown by the thin solid line. 

Figure 31 is a representation showing the partial nucleotide sequence of mouse SOCS9 cDNA 
(contig m9. 1), derived from analysis of the ESTs listed in Table 9. 1 . The relationship of these 
10 contigs to the mouse cDNA sequence is illustrated in Figure 30. 

Figure 32 is a representation showing the partial nucleotide sequence of human SOCS9 cDNA 
(contig h9. 1), derived from analysis of the ESTs listed in Table 9.2. Although it is clear that 
contig h9.1 encodes a protein with an SH2 domain and a SOCS box, the quality of the sequence 
15 is not high enough to derive a single unambiguous open reading frame. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 30. 

Figure 33 is a representation showing the relationship of mouse SOCSIO cDNA clones (10-9, 
. 10-12, 10-23 and 10-24) to contigs derived from analysis of mouse ESTs (Table 10.1) and 

20 human ESTs (Table 10.2). The nucleotide sequence of the mouse SOCS 10 contig is shown in 
Figure 10.2, with the sequence of human SOCSIO contigs (hlO.l and hlO.2) being shown in 
Figure 35. The predicted structure of the protein is shown schematically, with the ankyrin 
repeats indicated by ( ) and the SOCS box by ( ). The putative 3' untranslated regions is shown 
by the thin line solid line in the mouse and by the wavy line in hlO.2. Based on analysis of clones 

25 isolated to date and ESTs the 3' untranslated regions of mSOCS-lO and hSOCS-10 share little 
similarity. 

Figure 34 is a representation showing the nucleotide sequence of the mouse SOCSIO derived 
from analysis of cDNA clone 10-9, 10-12, 10-23 and 10-24. The nucleotides encoding the part 
30 of the predicted coding region, ending in the stop codon are shown in upper case, while the 
predicted 3' untranslated regions are shown in lower case. Although it is clear that contig mlO. 1 
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encodes a protein with a series of ankyrin repeats and a SOCS box, the quality of the sequence 
is not high enough to derive a single unambiguous open reading frame. The relationship of 
mouse cDNA sequence to mouse and human EST contigs is illustrated in Figure 33. 

5 Figure 35 is a representation showing the nucleotide sequence of human SOCS 10 cDNA contig 
hlO.2 and hlO.2 derived from analysis of the ESTs listed in Table 10.2. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 33. 

Figure 36A is a representation showing the partial nucleotide sequence of the human SOCS 11 
cDNA derived from analysis of ESTs listed in Table 11.1 The nucleotides encoding the mature 
10 coding region from the predicted ATG "start" codon to the stop codon is shown in upper case, 
while the predicted 5' and 3' untranslated regions are shown in lower case. The relationship of 
the partial cDNA sequence, derived from ESTs, to the predicted protein is shown in Figure 37. 

Figure 36B is a representation showing the partial predicted amino acid sequence of human 
15 SOCS 11 protein, derived from the nucleotide sequence in Figure 36A. The SOCS box, which 
also shown in Figure 13, is underiined. 

Figure 37 is a diagrammatic representation showing the relationship of sequence derived from 
analysis of human SOCS-1 1 ESTs (Table 11 . 1 and Figure 36A) to the predicted protein structure 
20 of human SOCS 11. The deduced partial amino acid sequence of human SOCS 11 is shown in 
Figure 36B. The structure of the protein is shown schematically with the SH2 domain shown 
by ( ) and the SOCS box highlighted by ( ). The predicted 3 ' untranslated region is shown by 
the thin line. 

25 Figure 38 is a diagrammatic representation showing the relationship of mouse SOCS 1 2 cDNA 
clones (12-1) to contigs derived from analysis of mouse ESTs (Table 12.1) and human ESTs 
(Table 12.2). The nucleotide sequence of the mouse SOCS 12 contig is shown in Figure 12.2, 
with the sequence of human SOCS12 contigs (hl2.1 and hl2.2) being shown in Figure 40. The 
deduced partial amino acid sequence of mouse SOCS 12 is shown in Figure 39. The structure 

30 of the protein is sown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box 
by ( ). The putative .3' untranslated region is shown by the thin line solid line in the mouse and 

SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



-29- 

by the wavy line in hi 2.2. Based on analysis of clones isolated to date and ESTs the 3' 
untranslated regions of mSOCS12 and hSOCS12 share little similarity. 

Figure 39 is a representation showing the nucleotide sequence of the mouse S0CS12 derived 
5 from analysis of cDNA clone 12-1 and the ESTs listed in Table 12.1. The nucleotides encoding 
the part of the predicted coding region, including the stop codon are shown in upper case, while 
the predicted 3' untranslated region is shown in lower case. By homology with human SOCS12 
it is clear that contig ml2.1 encodes a protein with a series of ankyrin repeats and a SOCS box, 
the quality of the sequence is not high enough to derive a single unambiguous open reading 
10 frame. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
illustrated in Figure 38. 

Figure 40 is a representation showing the nucleotide sequence of human SOCS 12 cDNA contig 
hl2.1 and hl2.2 derived from analysis of the ESTs listed in Table 12.2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 38. 

Figure 41 is a diagrammatic representation showing the relationship of contig ml3.1 derived 
from analysis of mouse S0CS13 cDNA clones (62-1, 62-6-7, 62-14) and mouse ESTs (Table 
. 13.1) to contig hl3.1 derived from analysis of human ESTs (Table 13.2). The nucleotide 
20 sequence of the mouse SOCS 1 3 contig is shown in Figure 42, with the sequence of human 
SOCS 13 contig (hi 3.1) being shown in Figure 43. The deduced amino acid sequence of mouse 
SOCS 13 is shown in Figure 42B. The structure of the protein is shown schematically, with the 
WD-40 repeats highlighted by ( ) and the SOCS box highlighted by ( ). The 3' untranslated 
region is shown by the thin line solid line. 

25 

Figure 42A is a representation showing the nucleotide sequence of the mouse SOCS 13 derived 
from analysis of cDNA clones 62-1, 62-6-7 and 62-14. The nucleotides encoding part of the 
predicted coding region, ending in the stop codon are shown in upper case, while those encoding 
the predicted 3' untranslated regions are shown in lower case. The relationship of mouse cDNA 
30 sequence to mouse and human EST contigs is illustrated in Figure 41. 
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Figure 42B is a representation showing the predicted amino acid sequence of mouse S0CS13 
protein, derived from the nucleotide sequence in Figure 42A. The SOCS box, which also shown 
in Figure 13 is underlined. 

5 Figure 43 is a representation showing the nucleotide sequence of human SOCS 1 3 cDN A contig 
hl3.1 derived from analysis of the ESTs listed in Table 13.2. The relationship of these contigs 
to the mouse cDNA sequence is illustrated in Figure 41. 

Figure 44 is a diagrammatic representation showing the relationship of a partial mouse SOCS 14 
10 cDNA clone (14-1) to contigs derived from analysis of mouse ESTs (Table 14.1). The 
nucleotide sequence of the mouse SOCS 14 contig is shown in Figure 45. The deduced partial 
amino acid sequence of mouse SOCS 14 is shown in Figure 45B. The structure of the protein 
is shown schematically, with the SH3 domain indicated by ( ) and the SOCS box by ( ). The 
putative 3' untranslated region is shown by the thin line. 

15 

Figure 45A is a representation showing the nucleotide sequence of the mouse SOCS 14 derived 
from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
from the predicted ATG "start" codon to the stop codon is shown in upper case, while the 
. predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
20 cDNA sequence to mouse and human EST contigs is illustrated in Figure 44. 

Figure 45B is a representation showing the predicted amino acid sequence of mouse SOCS 14 
protein, derived from the nucleotide sequence in Figure 45B. The SOCS box, which also shown 
in Figure 13 is underlined. 

25 

Figure 46 is a diagrammatic representation showing the relationship of contig ml 5.1 derived 
from analysis of mouse BAC and mouse ESTs (Table 15.1) to contig hl5.1 derived from analysis 
of the human BAC and human ESTs (Table 15.2). The nucleotide sequence of the mouse 
SOCS15 contig is shown in Figure 47, with the sequence of human SOCS15 contig (hl5.1) 
30 being shown m Figure 47. The deduced amino acid sequence of mouse S0CS15 is shown in 
Figure 47B. The structure of the protein is shown schematically, with the WD-40 repeats 
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highlighted by ( ) and the SOCS box highlighted by ( ). The 5' and 3' untranslated region are 
shown by the thin line solid line. The introns which interrupt the coding region are shown by 

Figure 47A is a representation showing the nucleotide sequence covering the mouse SOCS 15 
5 gene derived from analysis the mouse BAG listed in Table 15.1. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case, while those encoding the predicted 5' untranslated region, the introns and the 3' 
untranslated region are shown in lower case. The relationship of mouse BAG to mouse and 
human ESTs contigs is illustrated in Figure 46. 

10 

Figure 47B is a representation showing the predicted amino acid sequence of mouse SOCS 15 
protein, derived from the nucleotide sequence in Figure 47A. The SOCS box, which also shown 
in Figure 13 is underlined. 

15 Figure 48A is a representation showing the nucleotide sequence covering the human SOCS 15 
gene derived from analysis the human BAG listed in Table 15.2. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case, while those encoding the predicted 5' untranslated region, the introns and the 3' 
. untranslated region are shown in lower case. The relationship of the human BAG to mouse and 

20 human ESTs contigs is illustrated in Figure 46. 

Figure 48B is a representation showing the predicted amino acid sequence of human SOCS 15 
protein, derived from the nucleotide sequence in Figure 48A. The SOCS box, which also shown 
in Figure 13 is underlined. 

25 

Figure 49 is a photographic representation showing SOCSl inhibition of JAK2 kinase activity. 
(A) Upper panel. Cos M6 cells were transiently transfected with either Flag-tagged mJAK2 and 
mSOCS-1 DNA (SOCSl) or Flag-mJAK2 DNA alone (-), lysed, JAK2 proteins 
immunoprecipitated using anti-JAK2 antibody and subjected to an in vitro kinase assay. Lower 
30 panel. A portion of the JAK2 immunoprecipitates were Western blotted with anti-JAK2 
antibody. (B) Upper panel. Cos M6 cells were transiently transfected with Flag- mJAK2 and 
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Flag- mSOCS-1 DNA or Flag-mJAK2 DNA alone, lysed, JAK2 proteins immunoprecipitated 
using anti-JAK2 (UBI) and separated by SDS/PAGE gel. Immunoprecipitates were then 
analysed by Western blot with anti-phosphotyrosine antibody. Lower panel; JAK2 expression. 
Cos cell lysates were separated by SDS/PAGE gel and analysed by Western blot with anti-FLAG 
5 antibody (M2). 

Figure 50 is a photographic representation showing interaction between JAK2 and SOCS 
protein. (A) Cos M6 cells were transiently transfected with Flag-tagged mJAKl and various 
Hag-tagged SOCS DNAs (SOCS-l;Sl, SOCS-2;S2, SOCS-3;S3, CIS) or Flag-niJAK2 alone, 

10 lysed, JAK2 proteins immunoprecipitated using anti-JAK2 (UBI) and separated by SDS/PAGE. 
Immunoprecipitates were then analysed by Western blot with anti-FLAG antibody (M2). (B) 
Cos cell lysates described in (A) were separated by SDS/PAGE and expression levels of the 
various proteins were determined by Western blot with anti-FLAG antibody (M2). (C) JAK2 
tyrosine phosphorylation. Cos cell lysates described in (A) were separated by SDS/PAGE and 

15 proteins analysed by Western blot with anti-phosphotyrosine antibody. 

Figure 51 is a diagrammatic representation of pPgalpAloxneo. 
Figure 52 is a diagrammatic representation of ppgalpAloxneoTK. 

20 

Figure 53 is a diagrammatic representation of SOCSl knockout construct. 



25 



30 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

The present invention provides a new family of modulators of signal transduction. As the initial 
members of this family suppressed cytokine signalling, the family is referred to as the 
5 "suppressors of cytolcine signalling" family of "SOCS". The SOCS family is defined by the 
presence of a C-terminal domain referred to as a "SOCS box". Different classes of SOCS 
molecules are defined by a motif generally but not exclusively located N-terminal to the SOCS 
box and which is involved by protein: molecule interaction such as protein :DN A or 
protein:protein interaction. Particularly preferred motifs are selected from an SH2 domain, WD- 
10 40 repeats and ankyrin repeats. 

WD-40 repeats were originally recognised in the P-subunit of G-proteins. WD-40 repeats appear 
to form a P-propeller-like structure and may be involved in protein-protein interactions. Ankyrin 
repeats were originally recognised in the cytoskeletal protein ankryin. 

15 

Members of the SOCS family may be identified by any number of means. For example, SOCSl 
to S0CS3 were identified by their ability to suppress cytokine-mediated signal transduction and, 
hence, were identified based on activity. SOCS4 to SOCS 15 were identified as nucleotide 
sequences exhibiting similarity at the level of the SOCS box. 

20 

The SOCS box is a conserved motif located in the C-terminal region of the SOCS molecule. In 
accordance with the present invention, the amino acid sequence of the SOCS box is: 

X, X3 X4 X5 Xg X7 Xg X9 X,o X,, X12 Xi3 X,4X,5 X,6[Xi]„ X,7 X18 X,9 X20 

25 X21 X22 X23 [Xj]„ X24 X25 X26 X27X28 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 
X3 is P, T or S; 
30 X4 is L, I, V, M, A or P; 

X5 is any amino acid; 
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is any amino acid; 
X7 is L, I, V, M, A, F, YorW; 
Xg is C, T or S; 
X9 is R, K or H; 
5 X,o is any amino acid; 

Xj, is any amino acid; 
X12 is L, I, V, M, A or P; 
Xi3 is any amino acid; 
Xi4 is any amino acid; 
10 X 15 is any amino acid; 

X16 is L, I, V, M, A, P, G, C, T or S; 

[Xi]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and whierein the sequence X; may comprise the same or different amino 
acids selected from any amino acid residue; 
15 X^isL, I, V,M, AorP; 

X,g is any amino acid; 
X,9 is any amino acid; 
X20 L, I, V, M, A or P; 
X2, is P; 

20 X22isL,I, V,M, A.PorG; 

X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
25 X24 is L, I, V, M, A or P; 

X25 is any amino acid; 
X26 is any amino acid; 
X27 is Y or F; and 
X2g is L, I, V, M, A or P. 



30 



As stated above and in accordance with the present invention, SOCS proteins are divided into 
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separate classes based on the presence of a proteinrmolecule interacting region such as but not 
limited to an SH2 domain, WD-40 repeats and ankyrin repeats located N-terminal of the SOCS 
box. The latter three domains are protein iprotein interacting domains. 

5 Examples of SH2 containing SOCS proteins include SOCS 1, SOCS2, S0CS3, SOCS5, S0CS9, 
SOCS 11 and SOCS 14. Examples of SOCS containing WD-40 repeats include SOCS4, S0CS6 
and SOCS 15. Examples of SOCS containing ankyrin repeats include SOCS7, SOCSIO and 
S0CS12. 

10 The present invention provides inter alia nucleic acid molecules encoding SOCS proteins, 
purified naturally occurring SOCS proteins as well as recombinant forms of SOCS proteins and 
methods of modulating signal transduction by modulating activity of SOCS proteins or 
expression of SOCS genes. Preferably, signal transduction is mediated by a cytokine, examples 
of which include EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, 

15 IFNy, TNFa, IL-1 and/or M-CSF. Particularly preferred cytokines include IL-6, LIF, OSM, 
IFN-y and/or thrombopoietin. 

Accordingly, one aspect of the present invention provides an isolated nucleic acid molecule 
. comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
20 protein or a derivative, homologue, analogue or mimetic thereof or comprises a nucleotide 
sequence capable of hybridizing thereto under low stringency conditions at 42''C wherein said 
protein comprises a SOCS box m its C-terminal region and optionally a protein: molecule 
interacting domain N-terminal of the SOCS box. 

25 Preferably, the protein:molecule interacting domain is a protein:DNA or protein:protein 
interacting domain. Most preferably, the protein:molecule interacting domain is one of an SH2 
domain, WD-40 repeats and/or ankyrin repeats. 

As stated above, preferably the subject SOCS modulate cytokine-mediated signal transduction. 
30 The present invention extends, however, to SOCS molecules modulating other effector-mediated 
signal transduction such as mediated by other endogenous or exogenous molecules, antigens. 
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microbes and microbial products, viruses or components thereof, ions, hormones and parasites. 
Endogenous molecules in this context are molecules produced within the cell carrying the SOCS 
molecule. Exogenous molecules are produced by other cells or are introduced to the body. 

5 Preferably, the nucleic acid molecule or SOCS protein is in isolated or purified form. The terms 
"isolated" and "purified" mean that a molecule has undergone at least one purification step away 
from other material. 

Preferably, the nucleic acid molecule is in isolated form and is DNA such as cDNA or genomic 
10 DNA. The DNA may encode the same amino acid sequence as the naturally occurring SOCS 
or the SOCS may contain one or more amino acid substitutions, deletions and/or additions. The 
nucleotide sequence may correspond to the genomic coding sequence (including exons and 
introns) or to the nucleotide sequence in cDNA from mRNA transcribed from the genomic gene 
or it may carry one or more nucleotide substitutions, deletions and/or additions thereto. 

15 

In a preferred embodiment, the nucleic acid molecule comprises a sequence of nucleotide 
encoding or complementary to a sequence encoding a SOCS protein or a derivative, homologue, 
analogue or mimetic thereof wherein the amino acid sequence of said SOCS protein is selected 
. from SEQ ID N0:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID NO:8 (mS0CS3), SEQ ID 

20 NO:10 (hSOCSl), SEQ ID NO:12 (rSOCSl), SEQ ID N0:14 (mSOCS4), SEQ ID NO:18 
(mSOCSS), SEQ ID N0:21 (mS0CS6), SEQ ID NO:25 (mSOCS27), SEQ ID NO:29 
(mSOCSS), SEQ ID NO:36 (hSOCSll), SEQ ID N0:41 (mSOCSlB), SEQ ID NO:44 
(mSOCS14), SEQ ID NO:46 (mSOCSlS) and SEQ ID NO:48 (mSOCSlS) or encodes an amino 
acid sequence with a single or multiple amino acid substitution, deletion and/or addition to the 

25 listed sequences or is a nucleotide sequence capable of hybridizing to the nucleic acid molecule 
under low stringency conditions at 42°C. 

In an even more preferred embodiment, the present invention provides a nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
30 SOCS protein or a derivative, homologue, analogue or mimetic thereof wherein the nucleotide 
sequence is selected from a nucleotide sequence substantially set forth in SEQ ID N0:3 
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(mSOCSl), SEQ ID NO:5 (mS0CS2), SEQ ID NO:7 (mS0CS3), SEQ ED NO:9 (hSOCSU), 
SEQ ID NO:ll (rSOCSl), SEQ ID N0:13 (mS0CS4), SEQ ID NO:15 and SEQ ID N0:16 
(hS0CS4), SEQ ID NO: 17 (mSOCSS). SEQ ID NO: 19 (hSOCS5), SEQ ID NO:20 (mS0CS6), 
SEQ ID NO:22 and SEQ ID NO:23 (hS0CS6), SEQ ID NO:24 (mS0CS7), SEQ ID NO:26 and 
5 SEQ ID NO:27 (hS0CS7), SEQ ID NO:28 (mSOCSS), SEQ ID NO:30 (mS0CS9), SEQ ID 
NO:31 (hSOCS9), SEQ ID NO:32 (mSOCSlO), SEQ ID NO:33 and SEQ ID NO:34 
(hSOCSlO), SEQ ID NO:35 (hSOCSll), SEQ ID NO:37 (mSOCS12), SEQ ID NO:38 and 
SEQ ID NO:39 (hSOCS12), SEQ ID NO:40 (mS0CS13), SEQ ID NO:42 (hSOCS13), SEQ 
ID NO:43 (mSOCSM), SEQ ID NO:45 (mSOCS15) and SEQ ID NO:47 (hSOCSlS) or a 
10 nucleotide sequence having at least about 15% similarity to aU or a region of any of the listed 
sequences or a nucleic acid molecule capable of hybridizing to any of the listed sequences under 
low stringency conditions at 42°C. 

Reference herein to a low stringency at 42°C includes and encompasses from at least about 1% 
15 v/v to at least about 15% v/v formamide and from at least about IM to at least about 2M salt for 
hybridisation, and at least about IM to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
20 to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 
about O.OIM to at least about 0.15M salt for hybridisation, and at least about O.OIM to at least 
about 0. 15M salt for washing conditions. 

25 In another embodiment, the present invention is directed to a SOCS protein or a derivative, 
homologue, analogue or mimetic thereof wherein said SOCS protein is identified as follows: 

human S0CS4 characterised by EST81149, EST180909, EST182619, ya99H09, 
ye70co4. yh53c09, yh77gll, yh87h05, yi45h07, yj04e06, yql2h06, yq56a06, yq60e02, 
30 yq92g03. yq97h06, yi90f01, yt69c03, yv30a08, yv55f07, yv57h09, yv87h02, yv98el 1, 

yw68dl0, yw82a03, yx08a07, yx72h06, yx76b09, yy37h08, yy66b02, za81f08, zbl8f07. 
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zc06e08, zdl4g06, zd51hl2, zd52b09, zeZSgll, ze69«)2, zf54f03, zh96e07, zv66hl2, 
zs83a08 and zs83g08; 

mouse SOCS-4 characterised by inc65f04, mf42e06, mplOclO, mr81g09, and mtl9hl2; 

human SOCS-5 characterised by EST15B103, EST15B105, EST27530 and zfSOfOl; 

mouse SOCS-5 characterised by mc55a01, mh98f09, my26hl2 and ve24e06; 

human SOCS-6 characterised by yf61e08, yf93a09, yg05fl2, yg41f04, yg45c02, 
yhl IflO, yhl3b05. zc35al2, ze02h08, zl09a03, zl69elO, zn39d08 and zo39e06; 

mouse SOCS-6 characterised by mc04c05, md48ap3, mOld03, mh26b07, mhVSell, 
mh88h09, mh94h07, mi27h04 and mj29c05, mp66g04, mw75g03, va53b05, vb34h02, 
vc55d07, vc59e05, vc67d03, vc68dlO. vc97h01, vc99c08, vd07h03, vdOBcOl, vd09bl2, 
vdl9b02, vd29a04 and vd46d06; 

human SOCS-7 characterised by STS WI30171, EST00939, EST12913, yc29b05, 
yp49fl0, ztlOf03 and zx73g04; 

mouse SOCS-7 characterised by mj39a01 and vi52h07; 
mouse SOCS-8 characterised by mj6e09 and vj27a029; 

human SOCS-9 characterised by CSRL-82f2-u, ESTl 14054, yy06b07, yy06g06, 
zr40c09, zr72h01, yx92c08, yx93b08 and hfe0662; 

mouse SOCS-9 characterised by me65d05; 

human SOCS-10 characterised by aa48hl0, zp35h01. zp97hl2, zqOShOl, zr34g05, 
EST73OO0 and HSDHEI005; 
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mouse SOCS-10 characterised by mbl4dl2, mb40fD6, mg89bll, mq89el2, mp03gl2 
and vh53cH; 

human SOCS-1 1 characterised by zt24h06 and zr43b02; 

5 

human SOCS-1 3 characterised by EST59161; 

mouse SOCS-13 characterised by ma39a09, me60c05, mi78g05, n[iklOcll, mo48gl2, 
mp94a01, vb57c07 and vhOVcl 1; and 

10 

human SOCS-14 characterised by mi75e03, vd29hl 1 and vd53g07; 

or a derivative or homologue of the above ESTs characterised by a nucleic acid molecule 

being capable of hybridizing to any of the listed ESTs under low stringency conditions 

at42°C. 

15 

In another embodiment, the nucleotide sequence encodes the following amino acid sequence: 



20 



25 



30 



X, X2 X3 X4 X5 Xg X7 Xg Xg X,o X)j X,2 Xi3 Xi4X,j X,6 [Xj]„ Xi7 Xjg X,9 X20 

X21 X22 X23 [Xj]„ X24 X25 X26 X27X2g 

wherein: X, is L, I, V, M, A or P; 

Xj is any amino acid residue; 

XjisP, Tor S; 

X4 is L, I, V, M, A or P; 

X5 is any amino acid; 

Xg is any amino acid; 

X7 is L, I, V. M. A, F, Y or W; 

Xg is C, T or S; 

Xg is R, K or H; 

X,o is any amino acid; 

X|, is any amino acid; 
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X,2 is L, I, V, M, A or P; 
X,3 is any amino acid; 
X,4 is any amino acid; 
X,5 is any amino acid; 
5 X,6 is L, I, V, M, A, P, G, C, T or S; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xi7is L, I, V, M, AorP; 
10 Xjg is any amino acid; 

X,9 is any amino acid; 
X20L, I, V, M, Aor P; 
X2, is P; 

X22 is L, I, V, M, A, P or G; 
15 X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 
20 X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M. A or P. 

25 The above sequence comparisons are preferably to the whole molecule but may also be to part 
thereof. Preferably, the comparisons are made to a contiguous series of at least about 21 
nucleotides or at least about 5 amino acids. More preferably, the comparisons are made against 
at least about 21 contiguous nucleotides or at least 7 contiguous amino acids. Comparisons may 
also only be made to the SOCS box region or a region encompassing the protein:molecule 

30 interacting region such as the SH2 domain WD-40 repeats and/or ankyrin repeats. 
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Still another embodiment of the present invention contemplates an isolated polypeptide or a 
derivative, homologue, analogue or mimetic thereof comprising a SOCS box in its C-terminal 
region. 

5 Preferably the polypeptide further comprises a protein: molecule interacting domain such as a 
protein:DNA or protein:protein interacting domain. Preferably, this domain is located N-terminal 
of the SOCS box. It is particularly preferred for the proteinrmolecule interacting domain to be 
at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats. 

10 Preferably, the signal transduction is mediated by a cytokine selected from EPO, TPO, G-CSF, 
GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSF. 
Preferred cytokines are IL-6, LIP, OSM, IFN-y or thrombopoietin. 

More preferably, the protein comprises a SOCS box having the amino acid sequence: 



15 




20 



wherein: 



X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 

Xj is P, TorS; 

X, is L, I, V, M, A or P; 

X5 is any amino acid; 

X^ is any amino acid; 

X7 is L, I, V, M, A, F, Y or W; 

Xj is C, T or S; 

X, is R, K or H; 

X,o is any amino acid; 

X,, is any amino acid; 



20 



25 



30 



X,2 is L, I, V, M, A or P; 
X,3 is any amino acid; 
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X,4 is any amino acid; 
X,5 is any amino acid; 
X,6 is L, I, V, M, A, P, G, C, T or S; 

[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
5 and wherein the sequence X; may comprise the same or different amino 

acids selected from any amino acid residue; 
Xj^is L, I, V, M, AorP; 
X,g is any amino acid; 
X|9 is any amino acid; 
10 X20 L, I, V, M, A or P; 

Xj, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
15 and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 
20 X27 is Y or F; and 

X28 is L, I, V, M, A or P. 

Still another embodiment provides an isolated polypeptide or a derivative, homologue, analogue 
or mimetic thereof comprising a sequence of amino acids substantially as set forth in SEQ ID 

25 N0:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID NO:8 (mSOCS3), SEQ ID NO:10 
(hSOCSl), SEQ ID NO: 12 (rSOCSl), SEQ ID NO: 14 (mSOCS4), SEQ ID NO: 18 (mSOCSS), 
SEQ ID NO:21 (mS0CS6), SEQ ID NO:25 (mSOCS7), SEQ ID NO:29 (mS0CS8), SEQ ID 
NO:36 (hSOCSl 1), SEQ ID NO:41 (mSOCSB), SEQ ID N0:44 (mSOCSl4), SEQ ID NO:46 
(mSOCSlS) and SEQ ID NO:48 (hSOCSl 5) or an amino acid sequence having at least 15% 

30 similarity to all or a part of the listed sequences. 
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Preferred nucleotide percentage similarities include at least about 20%, at least about 40%, at 
least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% 
or above such as 93%, 95%, 98% or 99%. 

5 Preferred amino acid similarities include at least about 20%, at least about 30%, at least about 
40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least 
about 90%, at least about 95%, at least about 97% or 98% or above. 

As stated above, similarity may be measured against an entire molecule or a region comprising 
10 at least 21 nucleotides or at least 7 amino acids. Preferably, similarity is measured in a conserved 
region such as SH2 domain, WD-40 repeats, ankyrin repeats or other protein:molecule 
interacting domains or a SOCS box. 

The term "similarity" includes exact identity between sequences or, where the sequence differs, 
15 different amino acids are related to each other at the structural, functional, biochemical and/or 
conformational levels. 

The nucleic acid molecule may be isolated from any animal such as humans, primates, livestock 
animals (e.g. horses, cows, sheep, donkeys, pigs), laboratory test animals (e.g. mice, rats, rabbits, 
20 hamsters, guinea pigs), companion animals (e.g. dogs, cats) or captive wild animals (e.g. deer, 
foxes, kangaroos). 

The terms "derivatives" or its singular form "derivative" whether in relation to a nucleic acid 
molecule or a protein includes parts, mutants, fragments and analogues as^ell as hybrid or 
25 fusion molecules and glycosylation variants. Particularly useful derivatives comprise single or 
multiple amino acid substitutions, deletions and/or additions to the SOCS amino acid sequence. 

Preferably, the derivatives have fiinctional activity or alternatively act as antagonists or agonists. 
The present invention further extends to homologues of SOCS which include the functionally or 
30 structurally related molecule from different animal species. The present invention also 
encompasses analogues and rnimetics. Mimetics include a class of molecule generally but not 
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necessarily having a non-amino acid structure and which functionally are capable of acting in an 
analogous manner to the protein for which it is a mimic, in this case, a SOCS. Mimetics may 
comprise a carbohydrate, aromatic ring, lipid or other complex chemical structure or may also 
be proteinaceous in composition. Mimetics as well as agonists and antagonists contemplated 
5 herein are conveniently located through systematic searching of environments, such as coral, 
marine and freshwater river beds, flora and microorganisms. This is sometimes referred to as 
natural product screening. Alternatively, libraries of synthetic chemical compounds may be 
screened for potentially useful molecules. 

10 As stated above, the present invention contemplates agonists and antagonists of the SOCS. One 
example of an antagonist is an antisense oligonucleotide sequence. Useful oligonucleotides are 
those which have a nucleotide sequence complementary to at least a portion of the protein- 
coding or "sense" sequence of the nucleotide sequence. These anti-sense nucleotides can be 
used to effect the specific inhibition of gene expression. The antisense approach can cause 

15 inhibition of gene expression apparently by forming an anti-parallel duplex by complementary 
base pairing between the antisense construct and the targeted mRNA, presumably resulting in 
hybridisation arrest of translation. Ribozymes and co-suppression molecules may also be used. 
Antisense and other nucleic acid molecules may first need to be chemically modified to permit 
penetration of cell membranes and/or to increase their serum half life or otherwise make them 

20 more stable for in vivo administration. Antibodies may also act as either antagonists or agonists 
although are more useful in diagnostic applications or in the purification of SOCS proteins. 
Antagonists and agonists may also be identified following natural product screening or 
screening of libraries of chemical compounds or may be derivatives or analogues of the SOCS 
molecules. 

25 

Accordingly, the present invention extends to analogues of the SOCS proteins of the present 
invention. Analogues may be used, for example, in the treatment or prophylaxis of cytokine 
mediated dysfunction such as autoimmunity, immune suppression or hyperactive immunity or 
other condition including but not limited to dysfunctions in the haemopoietic, endocrine, hepatic 
30 and neural systems. Dysfunctions mediated by other signal transducing elements such as 
hormones or endogenous or exogenous molecules, antigens, microbes and microbial products, 
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viruses or components thereof, ions, hormones and parasites are also contemplated by the 
present invention. 

Analogues of the proteins contemplated herein include, but are not limited to, modification to 
5 side chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

Examples of side chain modifications contemplated by the present invention include 
10 modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBH4; amidination with methylacetimidate; acylation with acetic 
anhydride; carbamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
15 phosphate followed by reduction with NaBH4. 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxal. 

20 The carboxyl group may be modified by carbodiimide activation via 0-acylisourea formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a ijiixed disulphides 
25 with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chIoromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 
other mercurials; carbamoylation with cyanate at alkahne pH. 

30 Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide or sulphenyl halides. 
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Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
a 3-nitrotyrosine derivative. 

Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
5 iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
10 sarcosine, 4-amino-3-hydroxy-6-methylheptanoic acid, 2-thienyl alanine and/or D-isomers of 
amino acids. A list of unnatural amino acid, contemplated herein is shown in Table 3. 
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TABLE 3 



5 



Non-conventional 
amino acid 


Code 


Non-conventional 
amino acid 


Code 


a-aminobutyric acid 


Abu 


L-N-methylalanine 


Nmala 


a-amino-a-methylbutyrate 


Mgabu 


L-N-methylarginine 


Nmarg 


aminocyclopropane- 


Cpro 


L-N-methylasparagine 


Nmasn 


carboxylate 




L-N-methylaspartic acid 


Nmasp 


aminoisobutyric acid 


Aib 


L-N-methylcysteine 


Nmcys 


aminonorbomyi- 


Norb 


L-N-methylglutamine 


Nmgln 


carboxylate 




L-N-methylglutamic acid 


Nmglu 


cyclohexylalanine 




Chexa L-N-methylhistidine 


Nmhis 


cyclopentylalanine 


Cpen 


L-N-methylisolleucine 


Nmile 


D-alanine 


Dal 


L-N-methylleucine 


Nmleu 


D-arginine 


Darg 


L-N-methyllysine 


Nm]ys 


D-aspartic acid 


Dasp 


L-N-methyl methionine 


Nmmet 


D-cysteine 


Dcys 


L-N-methyl norleucine 


Nmnle 


D-glutamine 


Dgln 


L-N-methylnorv aline 


Nmnva 


D-glutamic acid 


Dglu 


L-N-methylomithine 


Nmom 


D-histidine 


Dhis 


L-N-methylphenylalanine 


Nmphe 


D-isoleucine 


Dile 


L-N-methylproline 


Nmpro 


D-leucine 


Dleu 


L-N-methylserine 


Nmser 


D-lysine 


Dlys 


L-N-methylthreonine 


Nmthr 


D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-omithine 


Dorn 


L-N-methyl tyrosine 


Nmtyr 


D-phenyJalanine 


Dphe 


L-N-methylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-serine 


Dser 


L-N-methyl-t-butylglycine 


Nmtbug 


D-threonine 


Dthr 


L-norleucine 


Nle 
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u-irypiupnaii 


uirp 


D-tyrosine 


Dtyr 


D-valine 


Jjvai 


ij- (i-rncLnyidJuiiinc 


DmaJa 


5 D-c-methylarginine 


Dmarg 


ij-(x-iiicriiyid.sparagine 


Dmasn 


\j- cc-iTieiny 1 aspanaic 


Dmasp 


D-cc-methylcysteine 


Dmcys 


D-cc-methylglutamine 


Dmgln 


lu iJ-cc-iTieinyinisiiuine 


Dmhis 


D-a-methylisoleucine 


Umiie 


ij- u-rneinyijcucine 


Dmleu 


D-(x-methyllysine 


Dmlys 


D - (X - methy Ime thi onine 


Dmmet 


15 D-ot-methylomithine 


Dmorn 


D- cc -methylphenylalanine 


Dmphe 


D- (X -methy Iproline 


Dmpro 


D-ci-methylserine 


Dmser 


iv- IX -rneiny i inrconinc 


Dmthr 


zu iJ-cc-nieinyiirypiopnan 


Dmtrp 


D-0£-methyltyrosine 


Dmty 


D- ct -methyl valine 


Dmval 


- i > - 1 1 IC LI 1 y J tU oil 1 uc 


Dnmala 


L/-J > -memy larginine 


Dnmarg 


iJ-ri -IIieLIiy labpaTdginc 


Dnmasn 


JL>- JN -meinyiaspanaic 


Dnmasp 


u-iN -iiicuiy icysicinc 


Dnmcys 


D-N-methylglutamine 


Dnmgln 


D-N-methylglutamate 


Dnmglu 


30 D-N-methylhistidine 


Dnmhis 


D-N-methylisoleucine 


Dnmile 
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L-norvaline 


Nva 


a-methyl-aminoisobutyrate 


Maib 


a-methyl-Y-aminobutyrate 


Mgabu 


a-methylcyclohexylalanine 


Mchexa 


a-methylcylcopentylalanine 


Mcnen 


(x-methyl- a -napthyl alanine 




a-methylpenicillamine 


M^pen 


N-(4-aminobutyl)glycine 


Nelu 


N-(2-aminoethyl)glycine 




N-(3-aminopropyl)glycine 


Norn 


N-amino-a-methylbutyrate 


Nmaabii 


a-napthylalanine 


Anap 


N-benzvl2lvcine 


Nnhf» 


N-(2-carbamylethyl)glycine 


Neln 


N-(carbamylmethyl)glycine 


Nasn 


N-(2-carboxyethyl)glycine 


Nelu 


N-(carboxymethyl)glycine 


Nasp 


N-cyclobutylglycine 


Ncbut 


N-cycloheptylglycine 


Nchep 


N-cyclohexylglycine 


Nchex 


N-cyclodecylglycine 


Ncdec 


N-cylcododecylglycine 


Ncdod 


N-cyclooctylglycine 


Ncoct 


N-cvclopropvlglvcine 


Ncpro 


N-cycloundecylglycine 


NrnnH 


iM -(/,z-aipnenyjetnyi_)giycme 


Nbhm 


N-(3,3-diphenylpropyl)glycine 


Nbhe 


N-(3-guanidinopropyl)glycine 


Narg 


N-( 1 -hydroxyethyl)glycine 


Nthr 


N-(hydroxyethyl))glycine 


Nser 


N-(imidazo]ylethyl))glycine 


Nhis 
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D-N-methylleucine 
D-N-methyllysine 
N-methylcyclohexylalanine 
D-N-methylorni thine 
5 N-methy]g]ycine 

N-methylaminoisobutyrate 
N-( 1 -methylpropyl)glycine 
N-(2-methylpropyl)glycine 
D-N-methyltryptophan 

10 D-N-methyltyrosine 
D-N-methylvaline 
Y-aminobutyric acid 
L-?-butylglycine 
L-ethylglycine 

15 L-homophenylalanine 
L-a-methylarginine 
L-a-methylaspartate 
L-a-methylcysteine 
L-a-methylglutamine 

20 L-a-methylhistidine 
L- a-methy lisoleucine 
L- a-methy Ueucine 
L- a-methy Imethionine 
L-a-methylnorvaline 

25 L-a-methylphenylalanine 
L-a-methylserine 
L-a-methyltryptophan 
L-a-methylvaline 



-49- 



Dnmleu 


N-i'S-indolvlvethvnpl vrinp 


Nhtrp 


JL/lUliljro 


i>-iijciiiyi" y -d.iiiijiuuLJLyiaic 


iNIIlgaULI 


i ^ 1 i lUl IC Ad 


JL-' 1^ lllClliyjlliCLlUUlllilC 


i-'niiuiici 


T^nrnriTTi 


It lilt Lii y y L/Jijutiiiy idldllllic 


i> iiicpcii 


Nala 


j_y iTi iiiC'iiiy i^iit/iiy loiuiiiiic 


Tin mnno 

x-^Tiriipiie 




J-* iN-iiiciiiyipi uliiiC 


unmpro 


Nile 


1^ i>i~iiidiiy laci jiic 


Tim rm c^t* 




j--'-iN"iiicuiyiLiireuniiic 


Dnmthr 


Tin Tn1'TT\ 
i-^lilliU. p 


i^-y 1 -iiieuiyiciiiyi ^glycine 


iNvai 


TinTYll'\/T' 


iN-jiic Liiy 1 d-ndpiny 1 diiininc 


Nmanap 


JL/liIXXV<U 


iN-iiicuiyipciuciiidJTiinc 


jNmpcn 




iN-^jy-iiyui UAypiienyi ^giyciiie 


ixniyr 




i~ yLiiivjiiit'tii y 1 j^iyK^jii^ 


i\wyi> 


Ftp 


TV*n i pi 11 JiTTii np 

LA.^lllL.>llJ.dlliill& 


X CIl 


Hphe 


T -/7-rnf*tlivl5il5ininp 
j-^ iiiv./iix y idj-diuiit 


IvLald. 


iVldi ^ 


IX liidiiyidopala^lUC 


Art oon 




i-» u iiicLiiyi-i-L/uiyigjyvinc 


IVllDUg 


iVlL* y o 


jLf-iiicLiiyjcuiyigiycinc 


Metg 




JL-»-lA-HlCLIiyiglUlaIIlalC 


Mglu 




T -^7— rnf^tVivlhornirnVif^nvliil 5»niTi*^ 
1— ' u> iiiviii y iiJL/iii\Juiiciiy ididiiiiic^ 


iVUipiJC 


Mile 


1" iiiwiiiy luijudiiy i^gjyL-iiic 


Nmet 


^'Ileu 


T -rv— TTif*tnvllvciTi#* 


Mlys 


Mmet 


L- a-methy Inorleucine 


X X 1 

Mnle 


Mnva 


L-a-methylomithine 


Mom 


Mphe 


L-a-methylproline 


Mpro 


Mser 


L- a-methylthreonine 


Mthr 


Mtrp 


a-methy Ityrosine 


Mtyr 


Mval 


L-N-methylhomophenylalanine 


Nmhphe 
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N-(N-(2,2-diphenylethyl) 
carbamylmethyl)glycine 
1-carboxy- 1 -(2,2-diphenyl 
ethylamino)cyclopropane 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the bifunctional imido esters having (CH2)n spacer groups with n=l to n=6, 
glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifunctional reagents which usually 

10 contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C„ and N^j- 
methylamino acids, introduction of double bonds between C„ and Cp atoms of amino acids and 
the formation of cyclic peptides or analogues by introducing covalent bonds such as forming 

1 5 an amide bond between the N and C termini, between two side chains or between a side chain 
and the N or C terminus. 

These types of modifications may be important to stabilise the cytokines if administered to an 
individual or for use as a diagnostic reagent. 

20 

Other derivatives contemplated by the present invention include a range of glycosylation 
variants from a completely unglycosylated molecule to a modified glycosylated molecule. 
Altered glycosylation patterns may result from expression of recombinant molecules in different 
host cells. 

25 

Another embodiment of the present invention contemplates a method for modulating 
expression of a SOCS protein in a mammal, said method comprising contacting a gene encoding 
a SOCS or a factor/element involved in controlling expression of the SOCS gene with an 
effective amount of a modulator of SOCS expression for a time and under conditions sufficient 
30 to up-regulate or down-regulate or otherwise modulate expression of SOCS. An example of 
a modulator is a cytokine such as IL-6 or other transcription regulators of SOCS expression. 



Nnbhm N-(N-(3,3-diphenylpropyl) Nnbhe 

carbamylmethyl)glycine 

Nmbc 
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Expression includes transcription or translation or both. 

Another aspect of the present invention contemplates a method of modulating activity of SOCS 
in a human, said method comprising administering to said mammal a modulating effective 
5 amount of a molecule for a time and under conditions sufficient to increase or decrease SOCS 
activity. The molecule may be a proteinaceous molecule or a chemical entity and may also be 
a derivative of SOCS or a chemical analogue or truncation mutant of SOCS. 

A further aspect of the present invention provides a method of inducing synthesis of a SOCS 
10 or transcription/translation of a SOCS comprising contacting a cell containing a SOCS gene 
with an effective amount of a cytokine capable of inducing said SOCS for a time and under 
conditions sufficient for said SOCS to be produced. For example, SOCSl may be induced by 
IL-6. 

15 Still a further aspect of the present invention contemplates a method of modulating levels of a 
SOCS protein in a cell said method comprising contacting a cell containing a SOCS gene with 
an effective amount of a modulator of SOCS gene expression or SOCS protein activity for a 
time and under conditions sufficient to modulate levels of said SOCS protein. 

20 Yet a further aspect of the present invention contemplates a method of modulating signal 
transduction in a cell containing a SOCS gene comprising contacting said cell with an effective 
amount of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient 
to modulate signal transduction. 

25 Even yet a further aspect of the present invention contemplates a method of influencing 
interaction between cells wherein at least one cell carries a SOCS gene, said method comprising 
contacting the cell carrying the SOCS gene with an effective amount of a modulator of SOCS 
gene expression or SOCS protein activity for a time sufficient to modulate signal transduction. 

30 As stated above, of the present invention contemplates a range of mimedcs or small molecules 
capable of acting as agonists or antagonists of the SOCS. Such molecules may be obtained 
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from natural product screening such as from coral, soil, plants or the ocean or antarctic 
environments. Alternatively, peptide, polypeptide or protein libraries or chemical libraries may 
be readily screened. For example, Ml cells expressing a SOCS do not undergo differentiation 
in the presence of IL-6. This system can be used to screen molecules which permit 
5 differentiation in the presence of IL-6 and a SOCS. A range of test cells may be prepared to 
screen for antagonists and agonists for a range of cytokines. Such molecules are preferably 
small molecules and may be of amino acid origin or of chemical origin. SOCS molecules 
interacting with signalling proteins (eg. JAKS) provide molecular screens to detect molecules 
which interfere or promote this interaction. Once such screening protocol involves natural 
10 product screening. 

Accordingly, the present invention contemplates a pharmaceutical composition comprising 
SOCS or a derivative thereof or a modulator of SOCS expression or SOCS activity and one or 
more pharmaceutically acceptable carriers and/or diluents. These components are referred to 
15 as the "active ingredients". These and other aspects of the present invention apply to any SOCS 
molecules such as but not limited to SOCSl to SOCS15. 

The pharmaceutical forms containing active ingredients suitable for injectable use include sterile 
aqueous solutions (where water soluble) sterile powders for the extemporaneous preparation 

20 of sterile injectable solutions. It must be stable under the conditions of manufacture and storage 
and must be preserved against the contaminating action of microorganisms such as bacteria and 
fungi. The carrier can be a solvent or dispersion medium containing, for example, water, 
ethanol, polyol (for example, glycerol, propylene glycol and liquid polyethylene glycol, and the 
like), suitable mixtures thereof, and vegetable oils. The proper fluidity can be maintained, for 

25 example, by the use of a coating such as licithin, by the maintenance of the required particle size 
in the case of dispersion and by the use of superfactants. The preventions of the action of 
microorganisms can be brought about by various antibacterial and antifungal agents, for 
example, parabens. chlorobutanol, phenol, sorbic acid, thirmerosal and the like. In many cases, 
it will be preferable to include isotonic agents, for example, sugars or sodium chloride. 

30 Prolonged absorption of the injectable compositions can be brought about by the use in the 
compositions of agents delaying absorption, for example, aluminum monostearate and gelatin. 
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Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation 
of sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
5 freeze-drying technique which yield a powder of the active ingredient plus any additional 
desired ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 

10 shell gelatin capsule, or it may be compressed into tablets. For oral therapeutic administration, 
the active compound may be incorporated with excipients and used in the form of ingestible 
tablets, buccal tablets, troches, capsules, elixirs, suspensions, syrups, wafers and the like. Such 
compositions and preparations should contain at least 1 % by weight of active compound. The 
percentage of the compositions and preparations may, of course, be varied and may 

1 5 conveniently be between about 5 to about 80% of the weight of the unit. The amount of active 
compound in such therapeutically useful compositions in such that a suitable dosage will be 
obtained. Preferred compositions or preparations according to the present invention are 
prepared so that an oral dosage unit form contains between about 0. 1 /zg and 2000 mg of active 
compound. 

20 

The tablets, troches, pills, capsules and the like may also contain the components as hsted 
hereafter. A binder such as gum, acacia, com starch or gelatin; excipients such as dicalcium 
phosphate; a disintegrating agent such as com starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or 

25 saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen or cherry 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 
the above type, a Uquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 

30 sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and 
flavouring such as cherry or orange flavour. Of course, any material used in preparing any 
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dosage unit form should be pharmaceutically pure and substantially non-toxic in the amounts 
employed. In addition, the active compound(s) may be incorporated into sustained-release 
preparations and formulations. 

5 The present invention also extends to forms suitable for topical application such as creams, 
lotions and gels, 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
10 the like. The use of such media and agents for pharmaceutical active substances is well known 
in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

15 It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
of administration and uniformity of dosage. Dosage unit form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 

20 the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and (b) the limitations inherent in the art of compounding such an active material for the 
treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 

25 

The principal active ingredient is compounded for convenient and effective administration in 
effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 pg to about 2000 mg. Expressed in proportions, the 
30 active compound is generally present in from about 0.5 |Jg to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
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determined by reference to the usual dose and manner of administration of the said ingredients. 
The effective amount may also be conveniently expressed in terms of an amount per kg of body 
weight. For example, from about 0.01 ng to about 10,000 mg/kg body weight may be 
administered. 

5 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
of transfecting target cells where the vector carries a nucleic acid molecule capable of 
modulating SOCS expression or SOCS activity. The vector may, for example, be a viral vector. 
In this regard, a range of gene therapies are contemplated by the present invention including 
10 isolating certain cells, genetically manipulating and returning the cell to the same subject or to 
a genetically related or similar subject. 

Still another aspect of the present invention is directed to antibodies to SOCS and its 
derivatives. Such antibodies may be monoclonal or polyclonal and may be selected from 
15 naturally occurring antibodies to SOCS or may be specifically raised to SOCS or derivatives 
thereof In the case of the latter, SOCS or its derivatives may first need to be associated with 
a carrier molecule. The antibodies and/or recombinant SOCS or its derivatives of the present 
invention are particularly useful as therapeutic or diagnostic agents. 

20 For example, SOCS and its derivatives can be used to screen for naturally occurring antibodies 
to SOCS. These may occur, for example in some autoimmune diseases. Alternatively, specific 
antibodies can be used to screen for SOCS. Techniques for such assays are well known in the 
art and include, for exan^jle, sandwich assays and ELIS A. Knowledge of SOCS levels may be 
important for diagnosis of certain cancers or a predisposition to cancers or monitoring cytokine 

25 mediated cellular responsiveness or for monitoring certain therapeutic protocols. 

Antibodies to SOCS of the present invention may be monoclonal or polyclonal. Alternatively, 
fragments of antibodies may be used such as Fab fragments. Furthermore, the present invention 
extends to recombinant and synthetic antibodies and to antibody hybrids. A "synthetic 
30 antibody" is considered herein to include fragments and hybrids of antibodies. The antibodies 
of this aspect of the present invention are particularly useful for immunotherapy and may also 
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be used as a diagnostic tool for assessing apoptosis or monitoring the program of a therapeutic 
regimin. 

For example, specific antibodies can be used to screen for SOCS proteins. The latter would be 
5 important, for example, as a means for screening for levels of SOCS in a cell extract or other 
biological fluid or purifying SOCS made by recombinant means from culture supernatant fluid. 
Techniques for the assays contemplated herein are known in the art and include, for example, 
sandwich assays and ELISA. 

10 It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An 
antibody as contemplated herein includes any antibody specific to any region of SOCS. 

15 

Both polyclonal and monoclonal antibodies are obtainable by immunization with the enzyme 
or protein and either type is utilizable for immunoassays. The methods of obtaining both types 
of sera are well known in the art. Polyclonal sera are less preferred but are relatively easily 
prepared by injection of a suitable laboratory animal with an effective amount of SOCS, or 
20 antigenic parts thereof, collecting serum from the animal, and isolating specific sera by any of 
the known immunoadsorbent techniques. Although antibodies produced by this method are 
utilizable in virtually any type of immunoassay, they are generally less favoured because of the 
potential heterogeneity of the product. 

25 The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma ceD lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
which are well known to those who are skilled in the art. 

30 

Another aspect of the present invention contemplates a method for detecting SOCS in a 
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biological sample from a subject said method comprising contacting said biological sample with 
an antibody specific for SOCS or its derivatives or homologues for a time and under conditions 
sufficient for an antibody-SOCS complex to form and then detecting said complex. 

5 The presence of SOCS may be accomplished in a number of ways such as by Western blotting 
and ELISA procedures. A wide range of immunoassay techniques are available as can be seen 
by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. These, of course, include 
both single-site and two-site or "sandwich" assays of the non-competitive types, as well as in 
the traditional competitive binding assays. These assays also include direct binding of a labelled 
10 antibody to a target. 

Sandwich assays are among the most useful and commonly used assays and are favoured for 
use in the present invention. A number of variations of the sandwich assay technique exist, and 
all are intended to be encompassed by the present invention. Briefly, in a typical forward assay, 

1 5 an unlabelled antibody is immobilized on a solid substrate and the sample to be tested brought 
into contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody- antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody- 

20 antigen-labelled antibody. Any unreacted material is washed away, and the presence of the 
antigen is determined by observation of a signal produced by the reporter molecule. The results 
may either be qualitative, by simple observation of the visible signal, or may be quantitated by 
comparing with a control sample containing known amounts of hapten. Variations on the 
forward assay include a simultaneous assay, in which both sample and labelled antibody are 

25 added simultaneously to the bound antibody. These techniques are well known to those skilled 
in the art, including any minor variations as will be readily apparent. In accordance with the 
present invention the sample is one which might contain SOCS including cell extract, tissue 
biopsy or possibly serum, saliva, mucosal secretions, lymph, tissue fluid and respiratory fluid. 
The sample is, therefore, generally a biological sample comprising biological fluid but also 

30 extends to fermentation fluid and supernatant fluid such as from a cell culture. 
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In the typical forward sandwich assay, a first antibody having specificity for the SOCS or 
antigenic parts thereof, is either covalently or passively bound to a solid surface. The solid 
surface Is typically glass or a polymer, the most commonly used polymers being cellulose, 
polyacrylamide, nylon, polystyrene, polyvinyl chloride or polypropylene. The solid supports 
5 may be in the form of tubes, beads, discs of microplates, or any other surface suitable for 
conducting an immunoassay. The binding processes are well-known in the art and generally 
consist of cross-linking covalently binding or physically adsorbing, the polymer-antibody 
complex is washed in preparation for the test sample. An aliquot of the sample to be tested is 
then added to the solid phase complex and incubated for a period of time sufficient (e.g. 2-40 
10 minutes or overnight if more convenient) and under suitable conditions (e.g. room temperature 
to 37"C) to allow binding of any subunit present in the antibody. Following the incubation 
period, the antibody subunit solid phase is washed and dried and incubated with a second 
antibody specific for a portion of the hapten. The second antibody is linked to a reporter 
molecule which is used to indicate the binding of the second antibody to the hapten. 

15 

An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled 
with a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
20 Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 
first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
25 chemical nature, provides an analytically identifiable signal which allows the detection of 
antigen-bound antibody. Detection may be either qualitative or quantitative. The most 
commonly used reporter molecules in this type of assay are either enzymes, fluorophores or 
radionuclide containing molecules (i.e. radioisotopes) and chemiluminescent molecules. 

30 In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 
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wide variety of different conjugation techniques exist, which are readily available to the skilled 
artisan. Conimonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
5 enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield 
a fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
10 added to the complex of antibody-antigen-antibody. The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further 
quantitated, usually spectrophotometrically, to give an indication of the amount of hapten which 
was present in the sample. "Reporter molecule" also extends to use of cell agglutination or 
inhibition of agglutination such as red blood cells on latex beads, and the like. 

15 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
coupled to antibodies without altering their binding capacity. When activated by illumination 
with light of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 

20 characteristic colour visually detectable with a light microscope. As in the EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
unbound reagent, the remaining tertiary complex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescene and EIA techniques are both very well established in the art and are 

25 particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiluminescent or bioluminescent molecules, may also be employed. 

The present invention also contemplates genetic assays such as involving PCR analysis to detect 
SOCS gene or its derivatives. Alternative methods or methods used in conjunction include 
30 direct nucleotide sequencing or mutation scanning such as single stranded conformation 
polymorphisms analysis (SSCP) as specific oligonucleotide hybridisation, as methods such as 
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direct protein truncation tests. 



Since cytokines are involved in transcription of some SOCS molecules, the detection of SOCS 
provides surrogate marlcers for cytokines or cytokine activity. This may be useful in assessing 
5 subjects vi^ith a range of conditions such as those will autoimmune diseases, for example, 
rheumatoid arthritis, diabetes and stiff man syndrome amongst others. 

The nucleic acid molecules of the present invention may be DNA or RNA. When the nucleic 
acid molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic 
10 acid molecules of the present invention are generally mRNA. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
molecules such as vector molecules and in particular expression vector molecules. Vectors and 
15 expression vectors are generally capable of replication and, if applicable, expression in one or 
both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli, 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

20 Accordingly, another aspect of the present invention contemplates a genetic construct 
comprising a vector portion and a mammalian and more particularly a human SOCS gene 
portion, which SOCS gene portion is capable of encoding a SOCS polypeptide or a functional 
or immunologically interactive derivative thereof. 

25 Preferably, the SOCS gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said SOCS gene portion 
in an appropriate cell. 

In addition, the SOCS gene portion of the genetic construct may comprise all or part of the 
30 gene fosed to another genetic sequence such as a nucleodde sequence encoding glutathione-S- 
transferase or part thereof. 
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The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

The present invention also extends to any or all derivatives of SOCS including mutants, part, 
5 fragments, portions, homologues and analogues or their encoding genetic sequence including 
single or multiple nucleotide or amino acid substitutions, additions and/or deletions to the 
naturally occurring nucleotide or amino acid sequence. The present invention also extends to 
mimetics and agonists and antagonists of SOCS. 

10 The SOCS and its genetic sequence of the present invention will be useful in the generation of 
a range of therapeutic and diagnostic reagents and will be especially useful in the detection of 
a cytokine involved in a particular cellular response or a receptor for that cytokine. For 
example, cells expressing SOCS gene such as Ml cells expressing the SOCSl gene, wUl no 
longer be responsive to a particular cytokine such as, in the case of SOCSl, IL-6. Clearly, the 

15 present invention further contemplates cells such as Ml cells expressing any SOCS gene such 
as from SOCSl to SOCS 15. Furthermore, the present invention provides the use of molecules 
that regulate or potentiate the ability of therapeutic cytokines. For example, molecules which 
block some SOCS activity, may act to potential therapeutic cytokine activity (eg. G-CSF). 

20 Soluble SOCS polypeptides are also contemplated to be particularly useful in the treatment of 
disease, injury or abnormality involving cytokine mediated cellular responsiveness such as 
hyperimmunity, immunosuppression, allergies, hypertension and the like. 

A further aspect of the present invention contemplates the use of SOCS or its functional 
25 derivatives in the manufacture of a medicament for the treatment of conditions involving 
cytokine mediated cellular responsiveness. 

The present invention further contemplates transgenic mammalian cells expressing a SOCS 
gene. Such cells are usefiil indicator ceU lines for assaying for suppression of cytokine function. 
30 One exanple is Ml cells expressing a SOCS gene. Such cell lines may be useful for screening 
for cytokines or screening molecules such as naturally occurring molecules from plants, coral, 
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microorganisms or bio-organicaUy active soil or water capable of acting as cytokine antagonists 
or agonists. 

The present invention further contemplates hybrids between different SOCS from the same or 
5 different animal species. For example, a hybrid may be formed between all or a functional part 
of mouse SOCSl and human SOCSl. Alternatively, the hybrid may be between all or part of 
mouse SOCSl and mouse S0CS2. All such hybrids are contemplated herein and are 
particularly useful in developing pleiotropic molecules. 

10 The present invention further contemplates a range of genetic based diagnostic assays screening 
for individuals with defective SOCS genes. Such mutations may result in cell types not being 
responsive to a particular cytokine or resulting in over responsiveness leading to a range of 
conditions. The SOCS genetic sequence can be readily verified using a range of PGR or other 
techniques to determine whether a mutation is resident in the gene. Appropriate gene therapy 

1 5 or other interventionist therapy may then be adopted. 

The present invention is further described by the following non-limiting Examples. 
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Examples 1-16 relate to SOCSl, SOCS2 and S0CS3 which were identified on the basis of 
activity. Examples 17-24 relate to various aspects of S0CS4 to S0CS15 which were cloned 
initially on the basis of sequence similarity. Examples 25-36 relate to specific aspects of S0CS4 
to SOCS15, respectively. 
5 EXAMPLE 1 

CELL CULTURE AND CYTOKINES 
The Ml cell line was derived from a spontaneously arising leukaemia in SL mice [Ichikawa, 
1969]. Parental Ml cells used in this study have been in passage at the Walter and Eliza Hall 
Institute for Medical Research, Melbourne, Victoria, Australia, for approximately 10 years. Ml 

10 cells were maintained by weekly passage in Dulbecco's modified Eagle's medium (DME) 
containing 10% (v/v) foetal bovine serum (PCS). Recombinant cytokines are generally 
available from commercial sources or were prepared by published methods. Recombinant 
murine LIF was produced in Escherichia coli and purified, as previously described [Gearing, 
1989]. Purified human oncostatin M was purchased from PeproTech Inc (Rocky Hill, NJ, 

15 USA), and purified mouse IFN-y was obtained from Genzyme Diagnostics (Cambridge, MA, 
USA). Recombinant murine thrombopoietin was produced as a FLAGTM-tagged fusion 
protein in CHO cells and then purified. 

EXAMPLE 2 

20 AGAR COLONY ASSAYS 

In order to assay the differentiation of Ml cells in response to cytokines, 300 cells were 
cultured in 35 mm Petri dishes containing 1 ml of DME supplemented with 20%(v/v) fital calf 
serum (PCS), 0.3%(w/v) agar and 0.1 ml of serial dilutions of IL-6, LIP, GSM, IPN-y, tpo or 
dexamethasone (Sigma Chemical Company, St Louis, MI). After 7 days culture at 37° C in a 
25 fully humidified atmosphere, containing 10% (v/v) CO^ in air, colonies of Ml cells were 
counted and classified as differentiated if they were composed of dispersed cells or had a corona 
of dispersed cells around a tightly packed centre. 

EXAMPLE 3 

30 GENERATION OF RETROVIRAL LIBRARY 

A cDNA expression library was constructed from the factor-dependent haemopoietic cell line 
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FDC-PU essentially as described [Rayner, 1994], Briefly, cDNA was cloned into the retroviral 
vector pRUFneo and then transfected into an amphotrophic packaging cell line (PAS 17). 
Transiently generated virus was harvested from the cell supernatant at 48 hr posttransfection, 
and used to infect Y2 ecotropic packaging cells, to generate a high titre virus-producing cell 
5 line. 

EXAMPLE 4 
RETROVIRAL INFECTION OF Ml CELLS 

Pools of 10* infected T2 cells were irradiated (3000 rad) and cocultivated with 10^ Ml cells 
10 in DME supplemented with 10%(v/v) PCS and 4 pg/ml Polybrene, for 2 days at 37 °C. To 
select for IL-6-unresponsive clones, retrovirally-infected Ml cells were washed once in DME, 
and cultured at approximately 2x10" cells/ml m 1 ml agar cultures containing 400 pg/ml 
geneticin (GibcoBRL, Grand Island, NY) and lOOng/ml IL-6. The efficiency of infection of 
Ml cells was 1-2%, as estimated by agar plating the infected cells in the presence of geneticin 
15 only. 

EXAMPLE 5 
PCR 

Genomic DNA from retrovirally-infected Ml cells was digested with Sac I and 1 |ig of 
20 phenol/chloroform extracted DNA was then amplified by polymerase chain reaction (PCR). 
Primers used for amplification of cDNA inserts from the integrated retrovirus were GAG3 (5' 
CACGCCGCCCACGTGAAGGC 3' [SEQ ID NO: 1]), which corresponds to the vector gag 
sequence approximately 30 bp 5' of the multiple cloning site, and HSVTK (5' 
TTCGCCAATGACAAGACGCT 3' [SEQ ID NO:2]), which corresponds to the pMClneo 
25 sequence approximately 200 bp 3' of the multiple cloning site. The PCR entailed an initial 
denaturation at 94°C for 5 min, 35 cycles of denaturation at 94°C for 1 min, annealing at 56°C 
for 2 min, and extension at 72°C for 3 min, followed by a final 10 min extension. PCR products 
were gel purified and then ligated into the pGEM-T plasmid (Promega, Madison, WI), and 
sequenced using an ABI PRISM Dye Terminator Cycle Sequencing Kit and a Model 373 
30 Automated DNA Sequencer (Applied Biosystems Inc., Foster City, CA). 
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EXAMPLE 6 
CLONING OF cDNAs 

Independent cDNA clones encoding mouse SOCS 1 were isolated from a murine thymus cDNA 
library essentially as described (Hilton et al, 1994). The nucleotide and predicted amino acid 
5 sequences of mouse SOCSl cDNA were compared to databases using the BLASTN and 
TFASTA algorithms (Pearson and Lipman, 1988; Pearson, 1990; Altshcul et al, 1990). 
Oligonucleotides were designed from the ESTs encoding human SOCSl and mouse SOC-1 and 
SOCS 3 and used to probe commercially available mouse thymus and spleen cDNA libraries. 
Sequencing was performed using an ABI automated sequencer according to the manufacturer's 
10 instructions. 

EXAMPLE 7 

SOUTHERN AND NORTHERN BLOT ANALYSES AND RT-PCR 

■'^P-labelled probes were generated using a random decanucleotide labelling kit (Bresatec, 
1 5 Adelaide, South Australia) from a 600 bp Pst I fragment encoding neomycin phophotransfease 
from the plasmid pPGKneo, 1070 bp fragment of the SOCSl gene obtained by digestion of the 
1.4 kbp PCR product with Xho I, SOCS2, S0CS3, CIS and a 1.2 kbp fragment of the chicken 
glyceraldehyde 3-phosphate dehydrogenase gene [Dugaiczyk, 1983]. 

20 Genomic DNA was isolated from cells using a proteinase K-sodium dodecyl sulfate procedure 
essentially as described. Fifteen micrograms of DNA was digested with either BamH I or Sac 
I, fractionated on a 0.8%(w/v) agarose gel, transferred to GeneScreenPlus membrane (Du Pont 
NEN, Boston MA), prehybridised, hybridised with random-primed '^P-labelled DNA fragments 
and washed essentially as described [Sambrook, 1989]. 

25 

Total RNA was isolated from cells and tissues using Trizol Reagent, as recommended by the 
manufacturer (GibcoBRL.Grand Island, NY). When required polyA+ mRNA was purified 
essentially as described [Alexander, 1995]. Northern blots were prehybridised, hybridized with 
random-primed 32P-labelled DNA fragments and washed as described [Alexander, 1995]. 

30 

To assess the induction of SOCS genes by IL-6, mice (C57BL6) were injected intravenously 



SUBSTITUTE SHEET (Rule 26) 



wo 98/20023 



PCT/AU97/00729 



-66- 

with 5 Aig IL-6 followed by harvest of the liver at the indicated timepoints after injection. Ml 
cells were cultured in the presence of 20 ng/ml IL-6 and harvested at the indicated times. For 
RT-PCR analysis, bone marrow cells were harvested as described (Metacalf al, 1995) and 
stimulated for 1 hr at 37°C with 100 ng/ml of a range of cytokines. RT-PCR was performed 
5 on total RNA as described (Metcalf et al, 1995). PGR products were resolved on an agarose 
gel and Southern blots were hybridised with probes specific for each SOCS family member. 
Expression of p-actin was assessed to ensure uniformity of amplification. 

EXAMPLE 8 

10 DNA CONSTRUCTS AND TRANSFECTION 

A cDNA encoding epitope-tagged SOCSl was generated by subcloning the entire SOCSl 
coding region into the pEF-BOS expression vector [Mizushima, 1990], engineered to encode 
an inframe FLAG epitope downstream of an initiation methionine (pF-SOCSl). Using 
electroporation as described previously [Hilton, 1994], Ml cells expressing the thrombopoietin 

15 receptor (Ml.mpl) were transfected with the 20 //g of Aat Il-digested pF-SOCS 1 expression 
plasmid and 2 ;ig of a Sea I-digested plasmid in which transcription of a cDNA encoding 
puromycin N-acetyl transferase was driven from the mouse phosphoglycerokinase promoter 
(pPGKPuropA). After 48 hours in culture, transfected cells were selected with 20 |ag/ml 
puromycin (Sigma Chemical Company, St Louis MO), and screened for expression of SOCS 1 

20 by Western blotting, using the M2 anti-FLAG monoclonal antibody according to the 
manafacturer's instructions (Eastman Kodak, Rochester NY). In other experiments Ml cells 
were transfected with only the pF-SOCSl plasmid or a control and selected by their ability to 
grow in agar in the presence of 100 ng/ml of IL-6. 

25 
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EXAMPLE 9 

IMMUNOPRECIPITATION AND WESTERN BLOTTING 

Prior to either immunoprecipitaion or Western blotting, 10' Ml cells or their derivatives were 
washed twice, resuspended in 1ml of DME, and incubated at 37°C for 30 min. The cells were 
5 then stimulated for 4 min at 37°C with either saline or 100 ng/ml IL-6, after which sodium 
vanadate (Sigma Chemical Co., St Louis, MI) was added to a concentration of 1 mM. Cells 
were placed on ice, washed once with saline containing 1 mM sodium vanadate, and then 
solubilised for 5 min on ice with 300 |al 1% (v/v) Triton X-100, 150 mM NaCl, 2 mM EDTA, 
50 mM Tris-HCl pH 7.4, containing Complete protease inhibitors (Boehringer Mannheim, 
10 Mannheim, Germany) and 1 mM sodium vanadate. Ly.sates were cleared by centrifugation and 
quantitated using a Coomassie Protein Assay Reagent (Pierce, Rockford IL). 

For immunoprecipitations, equal concentrations of protein extracts (1-2 mg) were incubated 
for 1 hr or overnight at 4°C with either 4 pg of anti-gpl30 antibody (M20; Santa Cruz 

15 Bioteciinology Inc., Santa Cruz, CA) or 4 pg of anti-phosphotyrosine antibody (4G10; Upstate 
Biotechnology Inc., Lake Placid NY), and 15 pi packed volume of Protein G Sepharose 
(Pharmacia, Uppsala, Sweden) [Hilton et al, 1996]. Immunoprecipitates were washed twice 
in 1% (v/v) NP40, 150 mM NaCl , 50 mM Tris-HCl pH 8.0, containing Complete protease 
inhibitors (Boehringer Mannheim, Mannheim, Germany and 1 mM sodium vanadate. The 

20 samples were heated for 5 min at 95 °C in SDS sample buffer (625 mM Tris-HCl pH 6.8, 0.05% 
(w/v) SDS, 0.1% (v/v) glycerol, bromophenol blue, 0.125% (v/v) 2-mercaptoethanol), 
fractionated by SDS-PAGE and immunoblotted as described above. 

For Western blotting, 10 pg of protein from a cellular extract or material from an 
25 immunoprecipitation reaction was loaded onto 4-15% Ready gels (Bio-Rad Laboratories, 
Hercules CA), and resolved by sodium dodecyl sulfate polyacrylamide gel electrophoresis 
(SDS-PAGE). Proteins were transferred to PVDF membrane (Micron Separations Inc., 
Westborough MA) for 1 hr at 100 V. The membranes were probed with the following primary 
antibodies; anti-tyrosine phosphorylated STAT3 (1:1000 dilution; New England Biolabs, 
30 Beverly, MA); anti-STAT3 (C-20; 1:100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz 
CA); anti-gpl30 (M20, 1: 100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz CA); anti- 
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phosphotyrosine (horseradish peroxidase-conjugated RC20, 1:5000 dilution; Transduction 
Laboratories, Lexington KY); anti-tyrosine phosphorylated MAP kinase and anti-MAP kinase 
antibodies (1: 1000 dilution; New England Biolabs, Beveriy, MA). Blots were visualised using 
peroxidase-conjugated secondary antibodies and Enhanced Chemiluminescence (ECL) reagents 
5 according to the manafacturer's instructions (Pierce, Rockford IL). 

EXAMPLE 10 
ELECTROPHORETIC MOBILITY SHIFT ASSAYS 

Assays were performed as described [Novak, 1995], using the high affinity SIF (c-sis- inducible 
10 factor) binding site m67 [Wakao, 1994]. Protein extracts were prepared from Ml cells 
incubated for 4-10 min at ST^C in 10 ml serum-free DME containing either saline, 100 ng/ml 
IL-6 or 100 ng/ml IFN-y. The binding reactions contained 4-6 |jg protein (constant within a 
given experiment), 5 ng ^^P-labelled m67 oligonucleotide, and 800 ng sonicated salmon sperm 
DNA. For certain experiments, protein samples were preincubated with an excess of unlabelled 
15 m67 oligonucleotide, or antibodies specific for either STATl (Transduction Laboratories, 
Lexington, KY) or STATS (Santa Cruz Biotechnology Inc., Santa Cruz CA), as described 
[Novak, 1995]. 

. Western blots were performed using anti-tyrosine phosphorylated STAT3 or anti-STAT3 (New 
20 England Biolabs, Beverly, MA) or anti-gpl30 (Santa Cruz Biotechnology Inc.) as described 
(Nicola a/, 1996). EMSA were performed using the m67 oUgonucleotide probe, as described 
(Novak etal, 1995). 
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EXAMPLE 11 

EXPRESSION CLONING OF A NOVEL SUPPRESSOR OF 
CYTOKINE SIGNAL TRANSDUCTION 

In order to identify cDNAs capable of suppressing cytokine signal transduction, an expression 
5 cloning approach was adopted. This strategy centred on Ml cells, a monocytic leukaemia cell 
line that differentiates into mature macrophages and ceases proliferation in response to the 
cytokines IL-6, LIF, OSM and IFN-y, and the steroid dexamethasone. Parental Ml cells were 
infected with the RUFneo retrovirus, into which cDNAs from the factor-dependent 
haemopoietic cell line FDC-Pl had been cloned. In this retrovirus, transcription of both the 

10 neomycin resistance gene and the cloned cDNA was driven off the powerful constitutive 
promoter present in the retroviral LTR (Figure 1). When cultured in semi-solid agar, parental 
Ml cells form large tightly packed colonies. Upon stimulation with IL-6, Ml cells undergo 
rapid differentiation, resulting in the formation in agar of only single macrophages or small 
dispersed clusters of cells . Retrovirally-infected M 1 cells that were unresponsive to IL-6 were 

15 selected in semi-solid agar culture by their ability to form large, tightly packed colonies in the 
presence of IL-6 and geneticin. A single stable IL-6-unresponsive clone, 4A2, was obtained 
after examining 10" infected cells. 

A fragment of the neomycin phosphotransferase (neo) gene was used to probe a Southern blot 
20 of genomic DNA from clone 4A2 and this revealed that the cell line was infected with a single 
retrovirus containing a cDNA approximately 1.4 kbp in length (Figure 2). PGR amplification 
using primers from the retroviral vector which flanked the cDNA cloning site enabled recovery 
of a 1.4 kbp cDNA insert, which we have named suppressor of cytokine signalling- 1, or 
SOCSl. This PGR product was used to probe a similar Southern blot of 4A2 genomic DNA 
25 and hybridised to two fragments, one which corresponded to the endogenous SOCSl gene and 
the other, which matched the size of the band seen using the neo probe, corresponded to the 
SOGSl cDNA cloned into the integrated retrovirus (Figure 2). The latter was not observed in 
an Ml cell clone infected with a retrovirus containing an irrelevant cDNA. Similarly, Northern 
blot analysis revealed that SOCSl mRNA was abundant in the cell line 4A2, but not in the 
30 control infected Ml cell clone (Figure 2). 
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EXAMPLE 12 

SOCSl, SOCS2, SOCS3 AND CIS DEFINE A NEW FAMILY 
OF SH2-CONTAINING PROTEINS 

5 The SOCSl PCR product was used as a probe to isolate homologous cDNAs from a mouse 
thymus cDNA library. The sequence of the cDNAs proved to be identical to the PCR product, 
suggesting that constitutive or over expression, rather than mutation, of the SOCSl protein was 
sufficient for generating an IL-6-unresponsive phenotype. Comparison of the sequence of 
SOCSl cDNA with nucleotide sequence databases revealed that it was present on mouse and 

10 rat genomic DNA clones containing the protamine gene cluster found on mouse chromosome 
16. Closer inspection revealed that the 1.4 kb SOCSl sequence was not homologous to any 
of the protamine genes, but rather represented a previously unidentified open reading frame 
located at the extreme 3' end of these clones (Figure 3). There were no regions of discontinuity 
between the sequences of the SOCSl cDNA and genomic locus, suggesting that SOCSl is 

15 encoded by a single exon. In addition to the genomic clone containing the protamine genes, a 
series of murine and human expressed sequenced tags (ESTs) also revealed large blocks of 
nucleotide sequence identity to mouse SOCSl. The sequence information provided by the 
human ESTs allowed the rapid cloning of cDNAs encoding human SOCSl. 

20 The mouse and rat SOCSl gene encodes a 212 amino acid protein whereas the human SOCSl 
gene encodes a 21 1 amino acid protein. Mouse, rat and human SOCSl proteins share 95-99% 
amino acid identity (Figure 9). A search of translated nucleic acid databases with the predicted 
amino acid sequence of SOCSl showed that it was most related to a recently cloned cytokine- 
inducible immediate early gene product, CIS, and two classes of ESTs. Full length cDNAs 

25 from the two classes of ESTs were isolated and found to encode proteins of similar length and 
overall structure to SOCSl and CIS. These clones were given the names S0CS2 and SOCS3. 
Each of the four proteins contains a central SH2 domain and a C-terminal region termed the 
SOCS motif. The SOCSl proteins exhibit an extremely high level of amino acid sequence 
similarity (95-99% identity) amongst different species. However, the forms of the SOCSl, 

30 SOCS2, S0CS3 and CIS from the same animal, while clearly defining a new family of SH2- 
containing proteins, exhibited a lower amino acid identity. SOCS2 and CIS exhibit 
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approximately 38% amino acid identity, while the remaining members of the family share 
approximately 25% amino acid identity (Figure 9). The coding region of the genes for SOCSl 
and S0C3 appear to contain no introns while the coding region of the genes for S0CS2 and 
CIS contain one and two introns, respectively. 

5 

The Genbank Accession Numbers for the sequences referred to herein are mouse SOCSl 
cDNA (U88325), human SOCSl cDNA (U88326), mouse SOCS2 cDNA (U88327), mouse 
S0CS3 cDNA (U88328). 

10 EXAMPLE 13 

CONSTITUTIVE EXPRESSION OF SOCSl SUPPRESSES THE 
ACTION OF A RANGE OF CYTOKINES 

To formally establish that the phenotype of the 4A2 cell line was directly related to expression 
of SOCSl, and not to unrelated genetic changes which may have occurred independently in 
15 these cells, a cDNA encoding an epitope-tagged version of SOCSl under the control of the 
EFlcc promoter was transfected into parental Ml cells, and Ml cells expressing the receptor 
for thrombopoietin, c-mpl (Ml .mpl). Transfection of the SOCSl expression vector into both 
cell lines resulted in an increase in the frequency of IL-6 unresponsive Ml cells. 

20 Multiple independent clones of Ml cells expression SOCSl, as detected by Western blot, 
displayed a cytokine-unresponsive phenotype that was indistinguishable from 4A2. Further, if 
transfectants were not maintained in puromycin, expression of SOCSl was lost over time and 
cells regained their cytokine responsiveness. In the absence of cytokine, colonies derived from 
4A2 and other SOCSl expressing clones characteristically grew to a smaller size than colones 

25 formed by control Ml cells (Figure 10). 

The effect of constitutive SOCSl expression on the response of Ml cells to a range of 
cytokines was investigated using the 4A2 cell line and a clone of Ml. mpl cells expressing 
SOCSl (Ml.mpl.SOCSl). Unlike parental Ml cells and Ml. mpl cells, the two cell hnes 
30 expressing SOCSl continued to proliferate and failed to form differentiated colonies in response 
to either IL-6, LIF, OSM, IFN-y or, in the case of the Ml.mpl.SOCS 1 cell line, thrombopoietin 
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(Figure 4). For both cell lines, however, a normal response to dexamethasone was observed, 
suggesting that SOCSl specifically affected cytokine signal transduction rather than 
differentiation per se. Consistent with these data, while parental Ml cells and Ml.mpl cells 
became large and vacuolated in response to IL-6, 4A2 and Ml.mpl.SOCSl cells showed no 
5 evidence of morphological differentiation in response to IL-6 or other cytokines (Figure 5). 

EXAMPLE 14 

SOCSl INHIBITS A RANGE OF IL-6 SIGNAL TRANSDUCTION 
PROCESSES, INCLUDING STAT3 PHOSPHORYLATION 
10 AND ACTIVATION 

Phosphorylation of the cell surface receptor component gpl30, the cytoplasmic tyrosine kinase 
JAKl and the transcription factor STAT3 is thought to play a central role in IL-6 signal 
transduction. These events were compared in the parental Ml and Ml.mpl cell lines and their 
SOCSl -expressing counterparts. As expected, gpl30 was phosphorylated rapidly in response 

15 to IL-6 in both parental lines, however, this was reduced five- to ten-fold in the cell lines 
expressing SOCSl (Figure 6). Likewise, STAT3 phosphorylation was also reduced by 
approximately ten-fold in response to IL-6 in those cell lines expressing SOCSl (Figure 6). 
Consistent with a reduction in STAT3 phosphorylation, activation of specific STAT DNA 
. binding complexes, as determined by electrophoretic mobility shift assay, was also reduced. 

20 Notably, there was a reduction in the formation of SIF-A (containing STATS), SIF-B 
(STAT1/STAT3 heterodimer) and SIF-C (containing STATl), the three STAT complexes 
induced in Ml cells stimulated with IL-6 (Figure 7). Similarly, constitutive expression of 
SOCSl also inhibited IFN-y-stimulated formation of p91 homodimers (Figure 7). STAT 
phosphorylation and activation were not the only cytoplasmic processes to be effected by 

25 SOCSl expression, as the phosphorylation of other proteins, including she and MAP kinase, 
was reduced to a similar extent (Figure 7). 

EXAMPLE 15 

TRANSCRIPTION OF THE SOCSl GENE IS STIMULATED BY IL-6 
30 IN VITRO AND IN VIVO 

Although SOCSl can inhibit cytokine signal transduction when constitutively expressed in Ml 
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cells, this does not necessarily indicate that SOCS 1 normally functions to negatively regulate 
an IL-6 response. In order to investigate this possibility the inventors determined whether 
transcription of the SOCS 1 gene is regulated in the response of Ml cells to IL-6 and, because 
of the critical role IL-6 plays in regulating the acute phase response to injury and infection, the 
5 response of the liver to intravenous injection of 5 mg IL-6. In the absence of IL-6, SOCSl 
mRNA was undetectable in either Ml cells or in the liver. However, for both cell types, a 1.4 
kb SOCSl transcript was induced within 20 to 40 minutes by IL-6 (Figure 8). For Ml cells, 
where the IL-6 was present throughout the experiment, the level of SOCS 1 mRNA remained 
elevated (Figure 8). In contrast, IL-6 was administered in vivo by a single intravenous injection 
10 and was rapidly cleared from the circulation, resulting in a pulse of IL-6 stimulation to the liver. 
Consistent with this, transient expression of SOCS 1 mRNA was detectable in the liver, peaking 
approximately 40 minutes after injection and declining to basal levels within 4 hours (Figure 8). 

EXAMPLE 16 

1 5 REGULATION OF SOCS GENES 

Since CIS was cloned as a cytokine-inducible immediate early gene the inventors examined 
whether SOCSl, SOCS2 and SOCS3 were similarly regulated. The basal pattern of expression 
of the four SOCS genes was examined by Northern blot analysis of mRNA from a variety of 

20 tissues from male and female C57B 1/6 mice (Figure 1 1 A). Constitutive expression of SOCSl 
was observed in the thymus and to a lesser extend in the spleen and the lung. SOCS2 
expression was restricted primarily to the testis and in some animals the liver and lung; for 
S0CS3 a low level of expression was observed in the lung, spleen and thymus, while CIS 
expression was more widespread, including the testis, heart, lung, kidney and, in some animals, 

25 the liver. 

The inventors sought to determine whether expression of the four SOCS genes was regulated 
by IL-6. Northern blots of mRNA prepared from the livers of untreated and IL-6-injected 
mice, or from unstimulated and IL-6-stimulated Ml cells, were hybridised with labelled 
30 fragments of SOCSl, SOCS2, SOCS3 and CIS cDNAs (Figure 1 IB). Expression of all four 
SOCS genes was increased in the liver following IL-6 injecdon, however the kinetics of 
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induction appeared to differ. Expression of SOCS 1 and S0CS3 was transient in the liver, with 
mRNA detectable after 20 minutes of IL-6 injection and declining to basal levels within 4 hours 
for SOCS and 8 hours for SOCS3. Induction of SOCS2 and CIS niRNA in the liver followed 
similar initial kinetics to that of SOCSl, but was maintained at an elevated level for at least 24 
5 hours. A similar induction of SOCS gene mRNA was observed in other organs, notably the 
lung and the spleen. In contrast, in Ml cells, while SOCSl and CIS mRNA were induced by 
IL-6, no induction of either SOCS2 or SOCS3 expression was detected. This result highlights 
cell type-specific differences in the expression of the genes of SOCS family members in 
response to the same cytokine. 

10 

In order to examine the spectrum of cytokines that was capable of inducing transcription of the 
various members of the SOCS gene family, bone marrow cells were stimulated for an hour with 
a range of cytokines, after which mRNA was extracted and cDNA was synthesised. PGR was 
then used to assess the expression of SOCSl, S0CS2, SOCS3 and CIS (Figure 1 IC). In the 
15 absence of stimulation, little or no expression of any of the SOCS genes was detectable in bone 
marrow by PCR. Stimulation of bone marrow cells with a broad array of cytokines appeared 
capable of up regulating mRNA for one or more members of the SOCS family. IFNy, for 
example, induced expression of all four SOCS genes, while erythropoietin, granulocyte colony- 
. stimulating factor, granulocyte-macrophage colony stimulating factor and interleukin-3 induced 
20 expression of SOCS2, SOCS3 and CIS. Interestingly, tumor necrosis factor alpha, macrophage 
colony-stimulating factor and interleukin-1, which act through receptors that do not fall into 
the type I cytokine receptor class also appeared capable of inducing expression of S0CS3 and 
CIS, suggesting that SOCS proteins may play a broader role in regulating signal transduction. 

25 As constitutive expression of SOCS 1 inhibited the response of M 1 cells to a range of cytokines, 
the inventors examined whether phosphorylation of the cell surface receptor component gpl30 
and the transcription factor STAT3, which are though to play a central role in IL-6 signal 
transduction, were affected. These events were compared in the parental Ml and Ml.mpl cell 
Hnes and their SOCSl -expressing counterparts. As expected, gpl30 was phyosphorylated 

30 rapidly in response to IL-6 in both parental lines, however, this was reduced in the cell lines 
expressing SOCSl (Figure 12A). Likewise, STAT3 phosphorylation was also reduced in 
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response to IL-6 in those cell lines expressing SOCSl (Figure 12A). Consistent with a 
reduction in STAT3 phosphorylation, activation of specific STAT/DNA binding complexes, as 
determined by electrophoretic mobility shift assay, was also reduced. Notably, there was a 
failure to form SIF-A (containing STAT3) and SIF-B(STAT1/STAT3 heterodimer), the major 
5 STAT complexes induced in Ml cells stimulated with IL-6 (Figure 12B). Similarly, constitutive 
expression of SOCS 1 also inhibited IFNy -stimulating formation of SIF-C (STATl homodimer; 
Figure 12B). These experiments are consistent with the proposal that SOCSl inhibits signal 
transduction upstream of receptor and STAT phosphorylation, potentially at the level of the 
JAK kinases. 

10 

The ability of SOCS 1 to inhibit signal transduction and ultimately the biological response to 
cytokines suggest that, like the SH2-containing phosphatase SHP-1 [Ihle et al, 1994; Yi et al, 
1993], the SOCS proteins may play a central role in controlling the intensity and/or duration 
of a cell's response to a diverse range of extracellular stimuli by suppressing the signal 

15 transduction process. The evidence provided here indicates that the SOCS family acts in a 
classical negative feedback loop for cytokine signal transduction. Like other genes such as 
OSM, expression of genes encoding the SOCS proteins is induced by cytokines through the 
activation of STATs. Once expressed, it is proposed that the SOCS proteins inhibit the activity 
of JAKs and so reduce the phosphorylation of receptors and STATs, thereby suppressing signal 

20 transduction and any ensuing biological response. Importantly, inhibition of STAT activation 
will, over time, lead to a reduction in SOCS gene expression, allowing cells to regain 
responsiveness to cytokines. 

EXAMPLE 17 

25 DATABASE SEARCHES 

The NCBI genetic sequence database (Genbank), which encompasses the major database of 
expressed sequence tags (ESTs) and TIGR database of human expressed sequence tags, were 
searched for sequences with similarity to a concensus SOCS box sequence using the TFASTA 
30 and MOTIF/PATTERN algorithms [Pearson, 1990; Cockwell and Giles, 1989]. Using the 
software package SRS [Etzold et al, 1996], ESTs that exhibited similarity to the SOCS box 
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(and their partners derived from sequencing the other end of cDNAs) were retrieved and 
assembled into contigs using Autoassembler (Applied Biosystems, Foster City, CA). Consensus 
nucleotide sequences derived from overlapping ESTs were then used to search the various 
databases using BLASTN [Altschul et al, 1990]. Again, positive ESTs were retrieved and 
5 added to the contig. This process was repeated until no additional ESTs could be recovered. 
Final consensus nucleotide sequences were then translated using Sequence Navigator (Applied 
Biosystems, Foster City, CA). 

The ESTs encoding the new SOCS proteins are as follows: human SOCS4 (ESTS 1149, 

10 EST180909, EST182619, ya99H09, ye70co4, yh53c09, yh77gll, yh87h05, yi45h07, yj04e06, 
yql2h06, yq56a06. yq60e02, yq92g03, yq97h06, yr90fDl, yt69c03, yv30a08, yv55f07, 
yv57h09, yv87h02, yv98ell, yw68dl0, yw82a03, yx08a07, yx72h06, yx76b09, yy37h08, 
yy66b02, za81f08, zbl8fD7, zc06e08, zdl4g06, zd51hl2, zd52b09, ze25gl 1, ze69f02, zf54f03, 
zh96e07, zv66hl2, zs83a08 and zs83g08). mouse SOCS-4 (mc65f04, mf42e06, mplOclO, 

15 mr81g09, and mtl9hl2). human SOCS-5 (EST15B103, EST15B105, EST27530 and 
zfSOfOl). mouse SOCS-5 (mc55a01, mh98fD9, my26hl2 and ve24e06). human SOCS-6 
(yf61e08, yf93a09, yg05fl2, yg41f04, yg45c02, yhl IflO, yhl3b05, zc35al2, ze02h08, zl09a03, 
zl69el0, zn39d08 and zo39e06). mouse SOCS-6 (mc04c05, md48a03, mf31d03, mh26b07, 
mh78ell, mh88h09, mh94h07, mi27h04 and mj29c05, mp66g04, mw75g03, va53b05, 

20 vb34h02, vc55d07, vc59e05, vc67d03, vc68dl0, vc97h01, vc99c08, vd07h03, vdOScOl, 
vd09bl2, vdl9b02, vd29a04 and vd46d06). human SOCS-7 (STS WI30171, EST00939, 
EST12913, yc29b05, yp49fl0, ztlOfD3 and zx73g04). mouse SOCS-7 (mj39a01 and 
vi52h07). mouse SOCS-8 (mj6e09 and vj27a029). human SOCS-9 (CSRL-82f2-u, 
ESTl 14054. yy06b07, yy06g06, zr40c09, zr72h01, yx92c08, yx93b08 and hfe0662). mouse 

25 SOCS-9 (me65d05). human SOCS-10 (aa48hl0, zp35h01, zp97hl2, zqOShOl, zr34g05, 
EST73000 and HSDHEI005). mouse SOCS-10 (mbl4dl2, mb40f06, mg89bl 1, mq89el2, 
mp03gl2 and vh53cll). human SOCS-11 (zt24h06 and zr43b02). human SOCS-13 
(EST59161). mouse SOCS-13 (ma39a09, me60c05, mi78g05, mklOcll, mo48gl2, mp94a01, 
vb57c07 and vh07cl 1). human SOCS-14 (mi75e03, vd29hl 1 and vd53g07). 

30 
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EXAMPLE 18 
cDNA CLONING 

Based on the concensus sequences derived from overlapping ESTs, oligonucleotides were 
5 designed that were specific to various members of the SOCS family. As described above, 
oligonucleotides were labelled and used to screen commerically available genomic and cDNA 
libraries cloned with X bacteriophage. Genomic and/or cDNA clones covering the entire coding 
region of mouse SOCS4, mouse SOCS5 and mouse SOCS6 were isolated. The entire gene for 
SOCS 15 is on the human 12pl3 BAC (Genbank Accession Number HSU47924) and the mouse 
10 chromosome 6 BAC (Genbank Accession Number AC002393). Partial cDNAs for mouse 
SOCS7, SOCS9, SOCSIO, SOCSll, SOCS12, SOCS13 and SOCS14 were also isolated. 

EXAMPLE 19 
NORTHERN BLOTS AND rtPCR 

15 

Northern blots were performed as described above. The sources of hybridisation probes were 
as follows; (i) the entire coding region of the mouse SOCS 1 cDNA, (ii) a 1059 bp PCR product 
derived from coding region of SOCS5 upstream of the SH2 domain, (iii) the entire coding 
region of the mouse SOCS6 cDNA, (iv) a 790 bp PCR product derived from the coding region 
20 of a partial SOCS7 cDNA and (v) a 1200 bp Pst I fragment of the chicken glyceraldehyde 3- 
phosphate dehydrogenase (GAPDH) cDNA. 

EXAMPLE 20 
ADDITIONAL MEMBERS OF SOCS FAMILY 

25 

SOCSl, SOCS2 and SOCS3 are members of the SOCS protein family identified in Examples 
1-16. Each contains a central SH2 domain and a conserved motif at the C-terminus, named the 
SOCS box. In order to isolate further members of this protein family, various DNA databases 
were searched with the amino acid sequence corresponding to conserved residues of the SOCS 
30 box. This search revealed the presence of human and mouse ESTs encoding twelve further 
members of the SOCS protein family (Figure 13). Using this sequence information cDNAs 
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encoding S0CS4, S0CS5, S0CS6, S0CS7, S0CS9, SOCSIO, SOCSll, SOCS12, SOCS13, 
SOCS14 and SOCS15 have been isolated. Further analysis of contigs derived from ESTs and 
cDNAs revealed that the SOCS proteins could be placed into three groups according to their 
predicted structure N-terminal of the SOCS box. The three groups are those with (i) SH2 
5 domains, (ii) WD-40 repeats and (iii) ankyrin repeats. 



10 

EXAMPLE 21 
SOCS PROTEIN WITH SH2 DOMAINS 

Eight SOCS proteins with SH2 domains have been identified. These include SOCSl, SOCS2 
15 and SOCS3, SOCS5, SOCS9, SOCSll and SOCS14 (Figure 13). Full length cDNAs were 
isolated for mouse SOCS5 and SOCS 14 and partial clones encoding mouse S0CS9 and 
SOCS 14. Analysis of primary amino acid sequence and genomic structure suggest that pairs 
of these proteins (SOCSl and SOCS3, SOCS2 and CIS, SOCS5 and SOCS 14 and SOCS9 and 
. SOCSll) are most closely related (Figure 13). Indeed, the SH2 domains of SOCS5 and 
20 SOCS14 are almost identical (Figure 13B), and unlike CIS, SOCSl, S0CS2 and SOCS3, 
SOCS5 and SOCS 14 have an extensive, though less well conserved, N-terminal region 
preceding their SH2 domains (Figure 13A). 

EXAMPLE 22 

25 SOCS PROTEINS WITH WD-40 REPEATS 

Four SOCS proteins with WD-40 repeats were identified. As with the SOCS proteins with 
SH2 domains, pairs of these proteins appeared to be closely related. Full length cDNAs of 
mouse SOCS4 and SOCS6 were isolated and shown to encode proteins containing eight WD- 
30 40 repeats N-terminal of the SOCS box (Figure 13) and SOCS4 and SOCS6 share 65% amino 
acid similarity. SOCS 15 was recognised as an open reading frame upon sequencing BACs from 
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human chromosome 12pl3 and the syntenic region of mouse chromosome 6 [Ansari-Lari et al, 
1997]. In the human, chinq) and mouse, SOCS15 is encoded by a gene with two coding exons 
that lies within a few hundred base pairs of the 3' end of the triose phosphate isomerase (TPI) 
gene, but which is encoded on the opposite strand to TPI (9). In addition to a C-terminal 
5 SOCS box, the SOCS15 protein contains four WD-40 repeats. Interestingly, within the EST 
databases, there is a sequence of a nematode, an bsect and a fish relative of SOCS 15. SOCS 15 
appears most closely related to SOCS 13. 

EXAMPLE 23 

10 SOCS PROTEINS WITH ANKYRIN REPEATS 

Three SOCS proteins with ankyrin repeats were identified. Analysis of partial cDNAs of mouse 
SOCS7, SOCS 10 and SOCS 12 demonstrated the presence of multiple ankyrin repeats. 

15 EXAMPLE 24 

EXPRESSION PATTERN OF SOCS PROTEINS 

The expression of mRNA from representative members of each class of SOCS proteins - 
. SOCS 1 and SOCS5 from the SH2 domain group, SOCS6 from the WD-40 repeat group and 
20 SOCS7 from the ankyrin repeat group was examined. As shown above, SOCSl mRNA is 
found in abundance in the thymus and at lower levels in other adult tissues. 

Since transcription of the SOCSl gene is induced by cytokines, the inventors sought to 
determine whether levels of S0CS5, SOCS6 and S0CS7 mRNA increased upon cytokine 
25 stimulation. In the livers of mice injected with IL-6, SOCSl mRNA is detectable after 20 min 
and decreases to background levels within 2 hours. In contrast, the kinetics of SOCS5 mRNA 
expression are quite different, being only detectable 12 to 24 hours after IL-6 injection. S0CS6 
mRNA appears to be expressed constitutively while S0CS7 mRNA was not detected in the 
liver either before injection of IL-6 or at any time after injection. 

30 

Expression of these genes was also examined after cytokine stimulation of the factor-dependent 
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cell line FDCP-1 engineered to express bcl-w. Again, while S0CS6 mRNA was expressed 
constitutively. 

EXAMPLE 25 
5 SOCS4 

Mouse and human SOCS4 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS4 cDNAs are 
tabulated below (Tables 4.1 and 4.2). Using sequence information derived from mouse ESTs 

10 several oligonucleotides were designed and used to screen, in the conventional manner, a mouse 
thymus cDNA library cloned into ^-bacteriophage. Two cDNAs encoding mouse SOCS4 
were isolated and sequenced in their entirety (Figure 15) and shown to overlap the mouse 
ESTs identified in the database (Table 4.1 and Figure 17). These cDNAs include a region of 
5' untranslated region, the entire mouse SOCS4 coding region and a region of 3' untranslated 

15 region (Figure 17). Analysis of the sequence confirms that the SOCS4 cDNA encodes a 
SOCS Box at its C-terminus and a series of 8 WD-40 repeats before the SOCS Box (Figures 
17 and 16). The relationship of the two sequence contigs of human S0CS4 (h4.1 and h4.2) 
to the experimentally determined mouse SOCS4 cDNA sequence is shown in Figure 17. The 
nucleotide sequence of the two human contigs is listed in Figure 18. 

20 

SEQ ID NO: 13 and 14 represent the nucleotide sequence of murine SOCS4 and the 
corresponding amino acid sequence. SEQ ID NOs: 15 and 16 are SOCS4 cDNA human 
contigs h4.1 and h4.2, respectively. 

25 EXAMPLE 26 

SOCS5 

Mouse and human SOCS5 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS5 cDNAs are 
30 tabulated below (Tables 5.1 and 5.2). Using sequence information derived from mouse and 
human ESTs, several oligonucleotides were designed and used to screen, in the conventional 
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manner, a mouse thymus cDNA library, a mouse genomic DNA library and a human thymus 
cDNA library cloned into ^-bacteriophage . A single genomic DNA clone (57-2) and (5-3-2) 
cDNA clone encoding mouse S0CS5 were isolated and sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figures 19 and 20A). The entire 
5 coding region, in addition to a region of 5' and 3' untranslated regions of mouse S0CS5 
appears to be encoded on a single exon (Figure 19). Analysis of the sequence (Figure 20) 
confirms that S0CS5 genomic and cDNA clones encode a protein with a SOCS box at its C- 
terminus in addition to an SH2 domain (Figure 19 and 20B). The relationship of the human 
SOCS5 contig (h5.1; Figure 21) derived from analysis of cDNA clone 5-94-2 and the human 
10 S0CS5 ESTs (Table 5.2) to the mouse SOCS5 DNA sequence is shown in Figure 19. The 
nucleotide sequence and corresponding amino acid sequence of murine SOCS5 are shown in 
SEQ ID NOs: 17 and 18, respectively. The human SOCS5 nucleotide sequence is shown in 
SEQ ID NO: 19. 

15 EXAMPLE 27 

SOCS6 

Mouse and human SOCS6 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS6 cDNAs are 

20 tabulated below (Tables 6.1 and 6.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. Eight cDNA clones (6-1 A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N, 6-5N) 
cDNA clone encoding mouse SOCS6 were isolated and sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figures 22 and 23A). Analysis of 

25 the sequence (Figure 23) confirms that the mouse S0CS6 cDNA clones encode a protein with 
a SOCS box at its C-terminus in addition to a eight WD-40 repeats (Figures 22 and 23B). The 
relationship of the human SOCS-6 contigs (h6.1 and h6.2 ; Figure 24) derived from analysis of 
human SOCS6 ESTs (Table 6.2) to the mouse SOCS6 DNA sequence is shown in Figure 22. 
The nucleotide and corresponding amino acid sequences of murine SOCS6 are shown in SEQ 

30 ID NOs: 20 and 21, respectively. SOCS6 human contigs h6. 1 and h6.2 are shown in SEQ ID 
NOs: 22 and 23, respectively. 
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EXAMPLE 28 
SOCS7 

Mouse and human SOCS7 were recognized through searching EST databases using the SOCS 
5 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS-7 cDNAs are 
tabulated below (Tables 7.1 and 7.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. One cDNA clone (74-1 OA- 11) cDNA clone encoding mouse SOCS7 
was isolated and sequenced in its entirety and shown to overlap with the mouse ESTs identified 

10 in the database (Figures 25 and 26A). Analysis of the sequence (Figure 26) suggests that 
mouse SOCS7 encodes a protein with a SOCS box at its C-terminus, in addition to several 
ankyrin repeats (Figure 25 and 26B). The relationship of the human SOCS7 contigs (h7. 1 and 
h7.2 ; Figure 27) derived from analysis of human S0CS7 ESTs (Table 7.2) to the mouse 
SOCS7 DNA sequence is shown in Figure 25. The nucleotide and corresponding amino acid 

15 sequences of murine SOCS7 are shown in SEQ ID NOs: 24 and 25, respectively. The 
nucleotide sequence of SOCS7 human contigs h7. 1 and h7.2 are shown in SEQ ID NOs: 26 and 
27, respectively. 

EXAMPLE 29 
20 SOCS8 

ESTs derived from mouse SOCS8 cDNAs are tabulated below (Table 8.1). As described for 
other members of the SOCS family, it is possible to isolate cDNAs for mouse SOCS 8 using 
sequence information derived from mouse ESTs. The relationship of the ESTs to the predicted 
25 coding region of SOCS8 is shown in Figure 28. With the nucleotide sequence obtained from 
the ESTs shown in Figure 29A and the partial amino acid sequence of SOCS8 shown in Figure 
29B. The nucleotide sequence and corresponding amino acid sequences for murine S0CS8 are 
shown in SEQ ID NOs:28 and 29, respectively. 



30 
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Mouse and human SOCS-9 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS9 cDNAs are 
tabulated below (Tables 9.1 and 9.2). The relationship of the mouse SOCS9 contigs (m9.1; 
Figure 9.2) derived from analysis of the mouse SOCS9 EST (Table 9. 1) to the human SOCS-9 
5 DNA contig (h9.1; Figure 32) derived from analysis of human S0CS9 ESTs (Table 9.2) is 
shown in Figure 31. Analysis of the sequence (Figure 32) indicates that the human S0CS9 
cDNA encodes a protein with a SOCS box at its C-terminus, in addition to an SH2 domain 
(Figure 30). The nucleotide sequence of muring SOCS9 cDNA is shown in SEQ ID NO:30. 
The nucleotide sequence of human S0CS9 cDNA is shown in SEQ ID NO:31. 

10 

EXAMPLE 31 
SOCSIO 

Mouse and human SOCSIO were recognized through searching EST databases using the SOCS 
15 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 10 cDNAs are 
tabulated below (Table 10.1 and 10.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. Four cDNA clones (10-9, 10-12, 10-23 and 10-24) encoding mouse 
SOCSIO were isolated, sequenced in their entirety and shown to overlap with the mouse and 
20 human ESTs identified in the database (Figures 33 and 34). Analysis of the sequence (Figure 
34) indicates that the mouse SOCSIO cDNA clone is not full length but that it does encode a 
protein with a SOCS box at its C-terminus, in addition to several ankyrin repeats (Figure 33). 
The relationship of the human SOCSIO contigs (hlO.l and hlO.2 ; Figure 35) derived from 
analysis of human SOCSIO ESTs (Table 10.2) to the mouse SOCSIO DNA sequence is shown 
25 in Figure 33. Comparison of mouse cDNA clones and ESTs with human ESTs suggests that 
the 3' untranslated regions of mouse and human SOCSIO differ significantly. The nucleotide 
sequence of murine SOCSIO is shown in SEQ ID NO: 32 and the nucleotide sequence of 
SOCSIO human contigs hi 0.1 and hi 0.2 are shown in SEQ ID NOs:33 and 34, respectively. 

30 EXAMPLE 32 

SOCSll 
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Human SOCSll were recognized through searching EST databases using the SOCS box 
consensus (Figure 13). Those ESTs derived from human SOCS 1 1 cDNAs are tabulated below 
(Table 11.1 and 11.2). The relationship of the human SOCS 1 1 contigs (hi 1 . 1 ; Figure 36A, B), 
derived from analysis ESTs (Table 11, 2) to the predicted encoded protein, is shown in Figure 
5 37. Analysis of the sequence indicates that the human SOCS 1 1 cDNA encodes a protein with 
a SOCS box at its C-terminus, in addition to an SH2 domain (Figure 37 and 36B). The 
nucleotide sequence and corresponding amino acid sequence of human SOCS 1 1 are represented 
in SEQ ID NOs:35 and 36, respectively. 

10 EXAMPLE 33 

SOCS12 

Mouse and human SOCS-12 were recognized through searching EST databases using the 
SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 12 

15 cDNAs are tabulated below (Tables 12.1 and 12.2). Using sequence information derived from 
mouse ESTs, several oligonucleotides were designed and use to screen, in the conventional 
manner, a mouse thymus cDNA library. Four cDNA clones (10-9, 10-12, 10-23 and 10-24) 
encoding mouse SOCS 12 were isolated, sequenced in their entirety and shown to overlap with 
the mouse and human ESTs identified in the database (Figures 38 and 39). Analysis of the 

20 sequence (Figure 39 and 40) indicates that the SOCS 12 cDNA clone encodes a protein with 
a SOCS box at its C-terminus, in addition to several ankyrin repeats (Figure 38). The 
relationship of the human SOCS 12 contigs (hl2.1 and hl2.2 ; Figure 40) derived from analysis 
of human SOCS12 ESTs (Table 12.2) to the mouse SOCS12 DNA sequence is shown in Figure 
38. Comparison of mouse cDNA clones and ESTs with human ESTs suggests that the 3" 

25 untranslated regions of mouse and human SOCS 12 differ significantly. The nucleotide 
sequence of SOCS12 is shown in SEQ ID NO:37. The nucleotide sequence of human S0CS12 
contigs hi 2. 1 and hi 2.2 are shown in SEQ ID NOs:38 and 39, respectively. 

EXAMPLE 34 
30 S0CS13 
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Mouse and human SOCS-13 were recognized through searching EST databases using the 
SOCS box consensus (Figure 13). Those ESTs derived from mouse and human S0CS13 
cDNAs are tabulated below (Tables 13.1 and 13.2). Using sequence information derived from 
mouse ESTs, several oligonucleotides were designed and use to screen, in the conventional 
5 manner, a mouse thymus and a mouse embryo cDNA library. Three cDNA clones (62-1, 62-6- 
7 and 62-14) encoding mouse SOCS13 were isolated, sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figure 41 and 42A). Analysis of the 
sequence (Figure 42) indicates that the mouse SOCS 13 cDNA encodes a protein with a SOCS 
box at its C-terminus, in addition to a potential WD-40 repeat (Figure 41 and 42B). The 
10 relationship of the human SOCS 13 contigs (hi 3.1 and hl3.2 ; Figure 43) derived from analysis 
of human SOCS 13 ESTs (Table 13.2) to the mouse SOCS 13 DNA sequence is shown in Figure 
41. The nucleotide sequence and corresponding amino acid sequence of murine SOCS 13 and 
shown in SEQ ID NOs:40 and 4 1 , respectively. The nucleotide sequence of human SOCS 13 
contig hi 3.1 is shown in SEQ ID NO:42. 

15 

EXAMPLE 35 
SOCS14 

Mouse and human SOCS- 14 were recognized through searching EST databases using the 
20 SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 14 
cDNAs are tabulated below (Tables 14.1 and 14.2). Using sequence information derived from 
mouse and human ESTs, several oligonucleotides were designed and use to screen, in the 
conventional manner, a mouse thymus cDNA library, a mouse genomic DNA library and a 
human thymus cDNA library cloned into ^-bacteriophage . A single genomic DNA clone (57- 
25 2) and (5-3-2) cDNA clone encoding mouse SOCS 14 were isolated and sequenced in their 
entirety and shown to overlap with the mouse ESTs identified in the database (Figures 44 and 
45A). The entire coding region, in addition to a region of 5' and 3' untranslated regions, of 
mouse SOCS 14 appears to be encoded on a single exon (Figure 44). Analysis of the sequence 
(Figure 45) confirms that SOCS 14 genomic and cDNA clones encode a protein with a SOCS 
30 box at its C-terminus in addition to an SH2 domain (Figure 44 and 45B). The relationship of 
the human SOCS14 contig (hl4.1; Figure 14.3) derived from analysis of cDNA clone 5-94-2 
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and the human S0CS14 ESTs (Table 14.2) to the mouse SOCS 14 DNA sequence is shown 
in Figure 44. 

The nucleotide sequence and corresponding amino acid sequence of murine SOCS 14 are 
shown in SEQ ID NOs: 43 and 44, respectively. 



SUBSTITUTE SHEET mc 



wo 98/20023 



PCT/AU97/00729 



-87- 

EXAMPLE 36 
SOCS15 

Mouse and human S0CS15 were recognized through searching DNA databases using the 
5 SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS15 
cDNAs are tabulated below (Tables 15.1 and 15.2), as are a mouse and human BAG that 
contain the entire mouse and human SOCS- 15 genes. Using sequence information derived from 
the ESTs and the BACs it is possible to predict the entire amino acid sequence of SOCS 15 and 
as described for the other SOCS genes it is feasible to design specific oligonucleotide probes 

10 to allow cDNAs to be isolated. The relationship of the BACs to the ESTs is shown in Figure 
46 and the nucleotide and predicted amino acid sequence of the SOCS- 15, derived from the 
mouse and human BACs is shown in Figures 47 and 48. The nucleotide sequence and 
corresponding amino acid sequence of murine SOCS 15 are shown in SEQ ID NOs:46 and 47, 
respectively. The nucleotide and corresponding amino acid sequence of human SOCS 15 are 

15 shown in SEQ ID NO:48 and 49, respectively. 

EXAMPLE 37 
SOCS INTERACTION WITH JAK2 KINASE 

20 These Examples show interaction between SOCS and JAK2 kinase. Interaction is mediated via 
the SH2 domain of SOCSl, 2, 3 and CIS. The interaction resulted in inhibition of JAK2 kinase 
activity by SOCS 1 (Figure 49). General interaction between JAK2 and SOCS 1 , 2, 3, and CIS 
is shown in Figure 50. 

25 The following methods are employed: 

Immunoprecipitation: Cos 6 cells were transiently transfected by electroporation and cultured 
for 48 hours. Cells were then lysed on ice in lysis buffer (50 mM Tris/HCL, pH 7.5, 150 mM 
NaCl, 1% v/v Triton-X-100, 1 mM EDTA. 1 mM Naf, 1 mM Na3V04) with the addition of 
30 complete protease inhibitors (Boehringer Mannheim), centrifuged at 4°C (14,000 x g, 10 min) 
and the supernatant retained for immunoprecipitation. JAK2 proteins were immunoprecipitated 
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using 5 lul anti-JAK2 antibody (UBI). Antigen-antibody complexes were recovered using 
protein A-Sepharose (30 /^l of a 50% slurry). 

Western blotting: Immunoprecipitates were analysed by sodium dodecyl sulphate (SDS) - 
5 polyacrylamide gel electrophoresis (PAGE) under reducing conditions. Protein was then 
electrophoretically transferred to nitrocellulose, blocked overnight in 10% w/v skim-milk and 
washed in PBS/0. 1% v/v Tween-20 (Sigma) (wash buffer) prior to incubation with either anti- 
phosphotyrosine antibody (4G10) (1:5000, UBI), anti-FLAG antibody (1.6 yug/ml) or anti-JAK2 
antibody (1:2000, UBI) diluted in wash bufrer/1% w/v BSA for 2 hr. Nitrocellulose blots were 
10 washed and primary antibody detected with either peroxidase-conjugated sheep anti-rabbit 
immunoglobulin (1:5000, SDenus) or peroxidase-conjugated sheep anti-mouse immunoglobulin 
(1:5000, Silenus) diluted in wash buffer/1% w/v BSA. Blots were washed and antibody binding 
visualised using the enhanced chemiluminescence (ECL) system (Amersham, UK) according 
to the manufacturers' instructions. 

15 

In-vUro kinase assay: An in vitro kinase assy was performed to assess intrinsic JAK2 kinase 
catalytic activity. JAK2 protein were immunopreciptated as described, washed twice in kinase 
assay buffer (50 mM NaCl, 5 mM MgClz, 5 mM MnC12, 1 mM NaF, 1 mM Na3V04, 10 mM 
HEPES, pH 7.4) and suspended in an equal volume of kinase buffer containing 0.25 /zCi/ml (y- 
20 22p)-ATP (30 min, room temperature). Exce^ (y- P)-ATP was removed and the 
immunoprecipitates analysed by SDS/PAGE under reducing conditions. Gels were subjected 
to a mild alkaline hydrolysis by treatment with 1 M KOH (55''C, 2 hours) to remove 
phosphoserine and phosphothreonine. Radioactive bands were visualised with IMAGEQUANT 
software on a Phosphorlmage system (Molecular Dynamics, Sunnyvale, CA, USA). 

25 

EXAMPLE 38 
MAKING SOCS-1 KNOCKOUT CONSTRUCTS 

Diagrams of plasmid constructs and knockout constructs are shown in Figures 51-53. The 
30 genomic SOCS-1 clone 95-1 1-10 was digested with the restriction enzymes BamHl andEcoRl 
to obtain a 3.6Kb DNA fragment 3' of the coding region (SOCS-1 exon), which was used as 
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the 3' arm in the SOCS-1 knockout vectors. The ends of this fragment were then blunted. This 
fragment was then ligated into the following vectors: 
pBgalpAloxNeo 
and pBgalpAloxNeoTK 

5 which had been linearized at the unique Xhol site and then blunted. This ligation resulted in the 
formation of the following vectors: 

3'SOCS-l arm in pBgalpAloxNeo 
and 3'SOCS-l arm in pBgalpAloxNeoTK 

10 The 5' arm of the SOCS-1 knockout vectors was constructed by using PCR to generate a 2.5Kb 
PCR product from the genomic SOCS-1 clone 95-1 1-10 just 5' of the SOCS-1 coding region 
(SOCS-1 exon). The oligo's used to generate this product were: 
5' oligo (sense) (2465) 

AGCT AGA TCT GGA CCC TAC AAT GGC AGC [SEQ ID NO:49] 

15 

3' oligo (antisense) (2466) 

AGCT AG ATC TGC CAT CCT ACT CGA GGG GCC AGC TGG [SEQ ID NO:50] 

The PCR product was then digested with the restriction enzyme Bglll, to generate Bglll ends 
20 to the PCR product. This 5' SOCS-1 PCR product.with Bglll, ends was then ligated as follows: 

3'SOCS-l arm in pBgalpAloxNeo and 3'SOCS-l arm in pBgalpAloxNeoTK, which had been 

linearized with the unique restriction enzyme BamHl. This resulted in the following vectors 

being formed: 

5'&3'SOCS-l arms in pBgalpAloxNeo 
25 and 5'&3'SOCS-l arms in pBgalpAloxNeoTK 

These were the final SOCS-1 knockout constructs. Both these constructs lacked the entire 
SOCS-1 coding region (SOCS-1 EXON), being replaced with portions of the Bgal, B globin 
polyA, PGK promoter, neomycin and PGK polyA sequences. The 5'&3'SOCS-l arms in 
30 pBgalpAloxNeoTK vector also contained the tymidine kinase gene sequence, between the 
neomycin and PGK poly A sequences. 
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The vectors: 5'&3'SOCS-l arms in pBgalpAloxNeo 

and 5'&3'SOCS-l arms in pBgalpAloxNeoTK 
were linearized with the unique restriction enzyme Notl and then transfected into Embryonic 
5 stem cells by electroporation. Clones which were resistant to neomycin were selected and 
analysed by southern blot to determine if they contained the correctly integrated SOCS-1 
targeting sequence. In order to determine if correct integration had occurred, genomic DNA 
from the neomycin resistant clones was digested with the restriction enzyme EcoRl. The 
digested DNA was then blotted onto nylon filters and probed with a 1.5Kb EcoRl /Hind III 
10 DNA fragment, which was further 5' of the 5'arm sequence used in the knockout constructs. 
The band sizes expected for correct integration were: 

Wild type SOCS-1 allele 5.4Kb 

15 SOCS-1 knockout allele 8.2Kb in 5'&3"SOCS-l arms in pBgalpAloxNeo 
or 1 1Kb in 5'&3'SOCS-l arms in pBgalpAloxNeoTK transfomed cells. 

Those skilled in the art will appreciate that the invention described herein is susceptible to 
variations and modifications other than those specifically described. It is to be understood that 
20 the invention includes all such variations and modifications. The invention also includes all of 
the steps, features, compositions and compounds referred to or indicated in this specification, 
individually or collectively, and any and all combinations of any two or more of said steps or 
features. 
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Table 4.1 

Summary of ESTs derived from mouse SOCS-4 cDNAs 

SOCS Species EST name End EST no 

SOCS-4 Mouse mc65f04 5' EST0549700 



mf42e06 



EST0593477 



mplOclO 5' EST0747905 

mr81g09 5' EST0783081 

mtl9hl2 5' EST0816531 



Library source Contig 

(113.5-14.5 mouse m4.1 
embryo 

dl 3.5- 14.5 mouse m4.1 
embryo 

d 8.5 mouse embryo m4.1 

dl3 embryo m4.1 

spleen m4.1 



Table 4.2 

Summary of ESTs derived from human SOCS-4 cDNAs 



SOCS Species 
SOCS-4 Human 



EST name 

27b5 

30d2 

J0159F 

J3802F 

EST19523 

ESTS 1149 

ESTl 80909 

EST! 826 19 

ya99h09 
ye70c04 
yh53c09 



End EST no 



Library source 



Conttg 

5' EST0534081 retina h4.2 

5' EST0534315 retina h4.2 

5' EST0461188 foetal heart h4.2 

5' EST0461428 foetal heart h4.2 

5" EST0958884 retina h4.2 

5' EST1011015 placenta h4.2 

5' EST0951375 JurkatT- h4.2 
lymphocyte 

5' EST0953220 JurlcatT- h4.1 

lymphocyte 

3' ESTOl 03262 placenta h4.2 

5' ESTOl 72673 foeatl liver/spleen h4.2 

5' ESTOl 97390 placenta h4.2 

3' EST0197391 h4.2 
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yh77gl 1 

yh87h05 

yi45h07 
yj04e06 

yql2h06 
yq56a06 
yq60e02 

yq92g03 

yq97h06 

yr90fDl 
yt69c03 

yv30a08 
yv55f07 

yv57h09 

yv87h02 
yv98ell 

yweSdlO 
yw82a03 



5' EST0203418 placenta 

3" EST0203419 



5' EST0204888 
3' EST0204773 



5' 

5' 
3' 

5' 

3' 

5' 
3' 



5" 

5' 
3' 

3' 

5' 
3' 

5' 
3' 

5' 

5' 
3' 

5' 

5' 



placenta 



EST0246604 placenta 
placenta 



EST0258541 
EST0258285 

EST0309968 

EST0346924 

EST0347259 
EST0347209 



foetal liver spleen 
foetal liver spleen 
foetal liver spleen 



5' EST0355932 foetal liver spleen 

3' EST0355884 

5' EST0357618 foetal liver spleen 

3' EST0357416 



EST0372402 foetal liver spleen 
foetal liver spleen 



EST0338395 
EST0338303 

EST0458506 

EST0465391 
EST0463331 

EST0464336 
EST0458765 

EST0388085 

EST0400679 
EST0400680 

EST0441370 

EST0463005 



foetal liver spleen 
foetal liver spleen 

foetal liver spleen 

melanocyte 
melanocyte 

placenta (8-9 wk) 
placenta (8-9 wk) 



h4.2 
h4.1 

h4.1 
h4.1 

h4.2 

h4.1 
h4.1 

h4.2 

h4.2 

h4.2 
h4.2 

h4.2 
h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
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yx08a07 
yx72h06 

yx76b09 
yy37h08 
yy66b02 

za81f08 
zbl8f07 
zc06e08 

zdl4g06 
zd51hl2 
zd52b09 

ze25gll 
ze69f02 

zf54fD3 
zh96e07 

zv66h]2 
zs83a08 

zs83g08 



3' 
3" 

5' 

3' 

5' 
5' 
5' 



EST0433678 



EST0407016 melanoocyte 



5' 
3' 

3' 

3' 

5' 
3' 

y 

5' 
3' 



5' 
3' 

5' 

5' 

3' 

5' 

3" 



EST0435158 
EST0422871 

EST0434011 

EST0451704 

ESTO505446 

EST0511777 

EST0485315 

EST0540473 
EST0540354 

EST0564666 

EST0578099 

EST0582012 
EST0581958 

EST0679543 

EST0635563 
EST0635472 

EST06801 1 1 

EST0616241 
EST0615745 

ESTl 043265 

EST0920072 

EST0920016 

EST0920121 

EST0920122 



melanoocyte 
melanoocyte 

melanoocyte 

melanoocyte 

multiple sclerosis 
lesion 

foetal lung 

foetal lung 



h4.1 

h4.1 

h4.2 
h4.1 

h4.2 

h4,2 

h4.2 

h4.2 
h4.1 



parathyroid tumor h4. 1 
h4.1 



foetal heart 
foetal heart 
foetal heart 

foetal heart 
retina 

retina 



h4.1 

h4.1 

h4.1 
h4.1 

h4.1 

h4,2 
h4.1 

h4.2 



foetal liver spleen h4.2 
h4.2 



8-9w foetus 



h4.2 



germinal centre B h4. 1 
cell 

h4.1 



germinal centre B h4. 1 
cell 



h4.1 
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Table 5.1 

Summary of ESTs derived from mouse SOCS-5 cDNAs 

SOCS Species EST name End EST no Library source Contig 

SOCS-5 Mouse mc55a01 5" EST0541556 d 13.5- 14.5 mouse mS.l 

embryo 

mh98f09 5' EST0638237 placenta m5.1 

my26hl2 5' EST0859939 mixed organs m5.1 

ve24e06 5' EST0819106 heart m5.1 

Table 5.2 

Summary of ESTs derived from human SOCS-5 cDNAs 

SOCS Species EST name End EST no Library source Contig 

SOCS-5 Human EST15B103 ? EST0258029 adipose tissue h5.1 

EST15B105 ? EST0258028 adipose tissue h5.1 

EST27530 5* EST0965892 cerebellum h5.1 

zfSOfDl 5" EST0679820 retina h5.1 

Table 6.1 

Summary of ESTs derived from mouse SOCS-6 cDNAs 

SOCS Species EST name End EST no Library source Contig 

SOCS-6 Mouse mco4c05 5' EST0525832 dl9.5 embryo m6.1 

md48a03 5' ESTO566730 dl3.5-14.5 embryo m6.1 

mf31d03 5' EST0675970 dl3.5-14.5 embryo m6.1 

mh26b07 5' EST0628752 dl 3.5-14.5 placenta m6.1 

mh78ell 5' EST0637608 dl3.5-14.5 placenta m6.1 

mh88h09 5' EST0644383 dl3.5-14.5 placenta m6.1 
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mh94h07 

mi27h04 

mj29c05 

mp66g04 

mw75g03 

va53b05 

vb34h02 

vc55d07 

vc59e05 

vc67d03 

vc68dl0 

vc97h01 

vc99c08 

vd07h03 

vdOScOl 

vd09bl2 

vdl9b02 

vd29a04 

vd46d06 



5' 
5' 
5' 
5' 
5' 
5' 
5' 
3' 
3' 
3' 
3' 
3' 
3' 
3' 
3' 
3' 
3' 
3' 



EST0638078 dl3.5-l 4.5 placenta 

EST0644252 dl3.5-14.5 embryo 

EST0664093 dl3.5-14.5 embryo 

EST0757905 thymus 

EST0847938 liver 



EST0901540 
EST0930132 
EST! 057735 
EST1058201 
ESTl 057849 
ESTl 058663 
ESTl 059343 
ESTl 0594 10 
EST1058173 
EST1058275 
ESTl 058632 
ESTl 059723 
? none found 



dl2.5 embryo 
lymph node 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 



3' ? none found 



m6.1 
m6.] 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
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Table 6.2 

Summary of ESTs derived from human SOCS-5 cDNAs 

SOCS Species EST name End EST no 

SOCS-6 Human 

yf61e08 5' EST0184387 

yf93a09 5' EST0186084 

yg05fl2 5' EST0191486 

yg41f04 5' EST0195017 

yg45c02 5' EST0185308 

yhllflO 5' EST0236705 

yhl3b05 5' 



zc35al2 

ze02h08 

zl09a03 

zl69el0 
zn39d08 



zo39e06 5' 



EST0237191 
EST0236958 

EST0555518 

EST0603826 
EST0603718 

EST0773936 
EST0773892 

EST0683363 

EST0718885 

EST0785947 



Library source 

d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 



senescent 
fibroblasts 

foetal heart 



pregnant uterus 
colon 

endothelial cell 
endothelial cell 



Contig 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 
h6.2 

h6. 1 

h6.1 
h6.2 

h6.1 
h6.1 

h6.1 

h6.1 

h6.1 



Table 7.1 

Summary of ESTs derived from mouse SOCS-7 cDNAs 



SOCS 

SOCS-7 



Species 

Mouse 



EST name 

mj39a01 

vi52h07 



End 

5' 
5' 



EST no 

EST0665627 
ESTl 267404 



Library source Contig 
d 13.5/1 4.5 embryo m7.1 
d7 .5 embryo m7.1 
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Table 7.2 

Summary of ESTs derived from human SOCS-5 cDNAs 

SOCS Species EST name End EST no Library source 

SOCS-7 HUMAN STSWI-30171 (G21563) Chromosome 2 

EST00939 5' EST0000906 hippocampus 

EST12913 3' EST0944382 uterus 



yc29b05 
yp49fl0 
ztlOf03 

zx73g04 



3" ESTO 128727 liver 



3* EST0301914 retina 



3' EST0921231 



Contig 

h7.2 
h7.1 
h7.2 
h7.2 
h7.2 



5" EST0922932 germinal centre h7.2 
Bcell 



h7.1 



3' ESTl 102975 ovarian tumour h7.1 



Table 8.1 

Summary of ESTs derived from mouse SOCS-8 cDNAs 



SOCS Species EST name 

SOCS-8 Mouse mjl6e09 
vj27a029 



End EST no Library source Contig 

rl EST0666240 dl3.5/14.5 embryo mS.l 

rl ESTl 155973 heart mS.l 



Table 9.1 

Summary of ESTs derived from mouse SOCS-9 cDNAs 
SOCS Species EST name End EST no 

Mouse me65d05 5' EST0585211 



Library source Contig 
d 13.5/14.5 embryo m9.1 



Table 9.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name 
SOCS-9 Human CSRL-83f2-u 
ESTl 14054 



End EST no 

(B06659) 



5' EST0939759 placenta 
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yy06b07 
yy06g06 
zr40c09 

zr72h01 

yx92c08 
yx93b08 
hfe0662 



3' 
5' 
5' 

5' 
3' 

5' 
5' 
5' 



ESTO434504 melanocyte h9.1 

EST0443783 melanocyte h9.1 

EST0832461 melanocyte, heart, h9.1 
uterus 

EST0892025 melanocyte, heart, h9.1 
uterus 



EST0892026 

EST0441160 melanocyte 



h9.1 
h9.1 

EST0441260 melanocyte h9.1 
EST0889611 foetal heart h9.I 



Table 10.1 

Summary of ESTs derived from mouse SOCS-10 cDNAs 



SOCS Species EST name 
Mouse mbl4dl2 
mb40f06 
mg89bl 1 
mq89el2 
mp03gl2 
vh53cll 



End EST no 



5' 



5' 



5' 



5' 



EST0549887 



EST05 15064 



EST0630631 



EST0776015 



EST0741991 



ESTl 154634 



Library source Contig 

dl9.5 embryo ml 0.1 

d 19.5 embryo ml 0.1 

dl3.5-14.5 embryo mlO.l 

heart ml 0.1 

heart ml 0.1 

mammary gland mlO.l 



Table 10.2 

Summary of ESTs derived from human SOCS-5 cDNAs 
SOCS Species EST name End EST no 

SOCS-10 Human aa48hl0 3' EST1135220 
zp35h01 3' EST0819137 
zp97hl2 5' EST0835442 



Library source Contig 

germinal centre B cell hi 0.2 

muscle hi 0.2 

muscle hIO.2 
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zq08h01 



3' EST08312n 



5' EST0835907 muscle 



zr34g05 5' EST0834251 melanocyte, heart, 

uterus 

3' EST0834440 



EST73000 5 
HSDHEI005 ? 



ESTl 004491 ovary 
EST0013906 heart 



Table 11.1 

Summary of ESTs derived from human SOCS-5 cDNAs 
SOCS Species EST name End EST no 

SOCS-11 Human zt24h06 rl EST0925023 



zr43b02 



rl 
si 



EST0873006 
EST0872954 



Table 12.1 

Summary of ESTs derived from mouse SOCS-12 cDNAs 
SOCS Species EST name End EST no 



Library source 



ovarian tumor 



hi 0.2 
hlO.l 
hlO.2 
hlO.2 
hlO.2 
hlO.2 



Contig 
11.1 



melanocyte, heart, uterus 11.1 
11.1 



Library source Contig 



SOCS-12 Mouse EST03803 5' EST1054173 

mtl8f02 5' EST0817652 

mz60gl0 5" EST0890872 

vaOScll 5- EST0909449 



day 7 . 5 emb ml2 . 1 

ectoplacental 

cone 

BNbMS spleen inl2 . 1 

lymph node inl2 . 1 

lymph node ml2 . 1 



Table 12.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name 



End EST no Library source Contig 



SOCS-12 Human STS-SHGC- 13867 



ESTl 77695 



Chromosome 2 hi 2.2 



5' EST0948071 Jurkat cells 



hl2.1 
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EST64550 
EST76868 
PMY2369 
yb38f04 

yg74el2 
yhl3g04 

yh48b06 
yh53a05 

yn48h09 

yn90a09 
yo08f03 

yolleOl 
yo63bl2 

yq56g02 
zh57c04 
7.h79h01 
zh99al 1 
zo92hl2 

zs48c01 



5' 
5' 
5' 
5' 

y 

5' 

5' 

3' 



5' 
3' 

5' 

3' 

3' 

5" 
3' 

3' 

5' 
3' 

3' 

3' 

3' 

3' 

5' 

3' 

5' 

3' 



EST0997367 Jurkat celJs 

ESTl 007291 pineal body 

ESTl 115998 KG-1 

ESTOl 08807 foetal spleen 

EST0224407 d73 brain 

EST0237226 d73 brain 
EST0236992 



yh48b06 



placenta 



ESTOl 97282 placenta 
ESTOl 97486 

EST0278258 brain 
EST0278259 

EST0302557 brain 

EST0301790 brain 
EST0302059 

? none found 

EST0303606 breast 
EST0304085 

EST0346935 foetal liver spleen 

EST059420 1 foetal liver spleen 

EST0598945 foetal liver spleen 

EST061 8570 foetal liver spleen 

EST0803392 ovarian cancer 
EST0803393 

EST09257 1 4 germinal centre 
Bcell 

EST0925530 



hl2.1 

hl2.2 

hl2.1 

hl2.1 
hl2.2 

hl2.1 

hl2.1 
hl2.2 

hl2.2 

hi 2.2 
hi 2.2 

hl2.2 
hl2.2 

hl2.2 

hi 2.2 
hl2.2 

hl2.2 

hl2.2 
hi 2.2 

hl2.1 

hi 2.2 

hi 2.2 

hi 2.2 

hl2.1 
hi 2.2 

hl2.1 
hl2.2 
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zs45h02 



EST0932296 germinal centre hl2.2 
Bcell 



Table 13.1 

Summary of ESTs derived from mouse SOCS-13 cDNAs 
SOCS Species EST name End EST no 



Library source 



Contig 



SOCS-13 Mouse ma39c09 5' 

me60c05 5' 

nii78g05 5' 

mklOcll 5' 

mo48gl2 5' 

mp94a01 5' 

vb57c07 5' 

vh07cll 5' 



EST0517875 day 19.5 embryo ml3.1 

EST0584950 day 13.5/14.5 embryo ml3.1 

EST0653834 day 19.5 embryo ml3.1 

EST0735158 day 19.5 embryo ml3,] 

EST0745111 day 10.5 embryo ml3.1 

EST0762827 thymus ml 3.1 

EST 1028976 day 11.5 embryo ml 3.1 

ESTl 117269 mammary gland ml 3.1 



Table 13.2 

Summary of ESTs derived from human SOCS-13 cDNAs 
SOCS Species EST name End EST no Library source Contig 

SOCS-13 Human EST59161 5' EST0992726 infant brain hlB.l 



Table 14.1 

Summary of ESTs derived from mouse SOCS-14 cDNAs 
SOCS Species EST name End EST no Library source Contig 



SOCS-14 mouse mi75e03 5' EST0651892 dl9.5 embryo ml4.1 

vd29hn 5' EST1067080 2 cell embryo ml4.1 
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vd53g07 5' ESTl 119627 2 cell embryo ml4.1 



Table 15.1 

Summary of ESTs derived from mouse SOCS-15 cDNAs 
SOCS Species EST name End EST no 



Library source Contig 



SOCS-15 Mouse mli29b05 5' 



mh98h09 

ml45a02 

mu43al0 

my38c09 

vj37li07 

AC002393 



EST0628834 

EST0638243 

EST0687171 

EST851588 

EST878461 

ESTll 74791 



placenta 

placenta 

testis 

thymus 

pooled organs 

diaphragm 

Chromosome 6 
BAC 



mlS.l 
mI5.1 
ml5.1 
ml5.1 
ml5.1 
ml5.1 
ml5.1 



Table 15.2 

Summary of ESTs derived from human SOCS-15 cDNAs 



SOCS Species 
SOCS-15 Human 



EST name End EST no 



Library source Contig 



EST98889 5' 
ne48bo5 3' 

ybl2h]2 5' 

3' 

HSU47924 



EST1026568 thyroid hl5.1 

ESTl 1 38057 colon tumour h 1 5 . 1 

EST0098885 placenta hl5.1 

EST0098886 hl5.1 

Chromosome 12 hi 5.1 
BAC 
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(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
CACGCCGCCC ACGTGAAGGC 2 0 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
TTCGCCAATG ACAAGACGCT 2 0 

{ 2 ) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1236 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1..636 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 



CGAGGCTCAA GCTCCGGGCG GATTCTGCGT GCCGCTCTCG CTCCTTGGGG TCTGTTGGCC -101 
GGCCTGTGCC ACCCGGACGC CCGGCTCACT GCCTCTGTCT CCCCCATCAG CGCAGCCCCG -41 
GACGCTATGG CCCACCCCTC CAGCTGGCCC CTCGAGTAGG 



ATG GTA GCA CGC AAC CAG GTG GCA GCC GAC AAT GCG ATC TCC CCG GCA 
Met Val Ala Arg Asn Gin Val Ala Ala Asp Asn Ala He Ser Pro Ala 
15 10 



15 



-1 
48 

96 



GCA GAG CCC CGA CGG CGG TCA GAG CCC TCC TCG TCC TCG TCT TCG TCC 
Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

TCG CCA GCG GCC CCC GTG CGT CCC CGG CCC TGC CCG GCG GTC CCA GCC 144 
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Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 

35 40 45 

CCA GCC CCT GGC GAC ACT CAC TTC CGC ACC TTC CGC TCC CAC TCC GAT 192 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

TAG CGG CGC ATC ACQ CGG ACC AGC GCG CTC CTG GAC GCC TGC GGC TTC 240 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

TAT TGG GGA CCC CTG AGC GTG CAC GGG GCG CAC GAG CGG CTG CGT GCC 28 8 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

GAG CCC GTG GGC ACC TTC TTG GTG CGC GAC AGT CGT CAA CGG AAC TGC 33 6 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 

100 105 110 

TTC TTC GCG CTC AGC GTG AAG ATG GCT TCG GGC CCC ACG AGC ATC CGC 3 84 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 

115 120 125 

GTG CAC TTC CAG GCC GGC CGC TTC CAC TTG GAC GGC AGC CGC GAG ACC 432 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

TTC GAC TGC CTT TTC GAG CTG CTG GAG CAC TAC GTG GCG GCG CCG CGC 480 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

CGC ATG TTG GGG GCC CCG CTG CGC CAG CGC CGC GTG CGG CCG CTG CAG 52 8 

Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

GAG CTG TGT CGC CAG CGC ATC GTG GCC GCC GTG GGT CGC GAG AAC CTG 57 6 

Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 

180 185 190 

GCG CGC ATC CCT CTT AAC CCG GTA CTC CGT GAC TAC CTG AGT TCC TTC 624 

Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 

195 200 205 



CCC TTC CAG ATC TGA CCGGCTG CCGCTGTGCC GCAGCATTAA GTGGGGGCGC 
Pro Phe Gin lie * 
210 


676 


CTTATTATTT 


CTTATTATTA 


ATTATTATTA 


TTTTTCTGGA 


ACCACGTGGG 


AGCCCTCCCC 


736 


GCCTGGGTCG 


GAGGGAGTGG 


TTGTGGAGGG 


TGAGATGCCT 


CCCACTTCTG 


GCTGGAGACC 


796 


TCATCCCACC 


TCTCAGGGGT 


GGGGGTGCTC 


CCCTCCTGGT 


GCTCCCTCCG 


GGTCCCCCCT 


856 


GGTTGTAGCA 


GCTTGTGTCT 


GGGGCCAGGA 


CCTGAATTCC 


ACTCCTACCT 


CTCCATGTTT 


916 


ACATATTCCC 


AGTATCTTTG 


CACAAACCAG 


GGGTCGGGGA 


GGGTCTCTGG 


CTTCATTTTT 


976 


CTGCTGTGCA 


GAATATCCTA 


TTTTATATTT 


TTACAGCCAG 


TTTAGGTAAT 


AAACTTTATT 


1036 


ATGAAAGTTT 


TTTTTTAAAA 


GAAAAAAAAA 


AAAAAAAAA 






1075 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 



Met Val Ala Arg Asn Gin Val Ala Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 
35 40 45 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arq 
145 150 155 160 

Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

Glu Leu Cys Arg Gin Arg He Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 

Ala Arg He Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 

Pro Phe Gin He 
210 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1121 base pairs 

(B) TYPE: nucleic acid 
(C> STRAND EDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 223.. 819 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GCGATCTGTG GGTGACAGTG TCTGCGAGAG ACTTTGCCAC ACCATTCTGC CGGAATTTGG 60 

AGAAAAAGAA CCAGCCGCTT CCAGTCCCCT CCCCCTCCGC CACCATTTCG GACACCCTGC 120 

ACACTCTCGT TTTGGGGTAC CCTGTGACTT CCAGGCAGCA CGCGAGGTCC ACTGGCCCCA 180 

GCTCGGGCGA CCAGCTGTCT GGGACGTGTT GACTCATCTC CO ATG ACC CTG CGG 23 4 

Met Thr Leu Arg 
1 

TGC CTG GAG CCC TCC GGG AAT GGA GCG GAC AGG ACG CGG AGC CAG TGG 2 82 

Cys L.GU Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr Arg Ser Gin Trp 
5 10 15 20 

GGG ACC GCG GGG TTG CCG GAG GAA CAG TCC CCC GAG GCG GCG CGT CTG 330 
Gly Thr Ala Gly Leu Pro Glu Glu Gin Ser Pro Glu Ala Ala Arg Leu 
25 30 35 

GCG AAA GCG CTG CGC GAG CTC AGT CAA ACA GGA TGG TAC TGG GGA AGT 3 78 

Ala Lys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trp Tyr Trp Gly Ser 
40 45 50 

ATG ACT GTT AAT GAA GCC AAA GAG AAA TTA AAA GAG GCT CCA GAA GGA 42 6 

Met Thr Val Asn Glu Ala Lys Glu Lys Leu Lys Glu Ala Pro Glu Gly 
55 60 65 

ACT TTC TTG ATT AGA GAT AGT TCG CAT TCA GAC TAC CTA CTA ACT ATA 474 
Thr Phe Leu lie Arg Asp Ser Ser His Ser Asp Tyr Leu Leu Thr lie 
70 75 80 

TCC GTT AAG ACG TCA GCT GGA CCG ACT AAC CTG CGG ATT GAG TAC CAA 522 
Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Arg lie Glu Tyr Gin 
85 90 95 100 

GAT GGG AAA TTC AGA TTG GAT TCT ATC ATA TGT GTC AAG TCC AAG CTT 570 
Asp Gly Lys Phe Arg Leu Asp Ser lie lie Cys Val Lys Ser Lys Leu 
105 110 115 

AAA CAG TTT GAC AGT GTG GTT CAT CTG ATT GAC TAC TAT GTC CAG ATG 618 
Lys Gin Phe Asp Ser Val Val His Leu lie Asp Tyr Tyr Val Gin Met 
120 125 130 

TGC AAG GAT AAA CGG ACA GGC CCA GAA GCC CCA CGG AAT GGG ACT GTT 656 
Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Arg Asn Gly Thr Val 
135 140 145 

CAC CTG TAC CTG ACC AAA CCT CTG TAT ACA TCA GCA CCC ACT CTG CAG 714 
His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala Pro Thr Leu Gin 
150 155 160 

CAT TTC TGT CGA CTC GCC ATT AAC AAA TGT ACC GGT ACG ATC TGG GGA 7 62 

His Phe Cys Arg Leu Ala lie Asn Lys Cys Thr Gly Thr lie Trp Gly 
165 170 175 180 

CTG CCT TTA CCA ACA AGA CTA AAA GAT TAC TTG GAA GAA TAT AAA TTC 810 
Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu Glu Tyr Lys Phe 
185 190 195 

CAG GTA TAAGTATTTC TCTCTCTTTT TCGTTTTTTT TTAAAAAAAA AAAAACACAT 866 
Gin Val 



GCCTCATATA GACTATCTCC GAATGCAGCT ATGTGAAAGA GAACCCAGAG GCCCTCCTCT 926 
GGATAACTGC GCAGAATTCT CTCTTAAGGA CAGTTGGGCT CAGTCTAACT TAAAGGTGTG 9 86 
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AAGATGTAGC TAGGTATTTT AAAGTTCCCC TTAGGTAGTT TTAGCTGAAT GATGCTTTCT 1046 
TTCCTATGGC TGCTCAAGAT CAAATGGCCC TTTTAAATGA AACAAAACAA AACAAAACAA 1106 
AAAAAAAAAA AAAAA 



(2) INFORMATION FOR SEQ ID NO : 6 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Thr Leu Arg Cys Leu Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr 
15 10 25 

Arg Ser Gin Trp Gly Thr Ala Gly Leu Pro Glu Glu Gin Ser Pro Glu 
20 25 30 

Ala Ala Arg Leu Ala Lys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trp 
35 40 45 

Tyr Trp Gly Ser Met Thr Val Asn Glu Ala Lys Glu Lys Leu Lvs Glu 
50 55 60 

Ala Pro Glu Gly Thr Phe Leu lie Arg Asp Ser Ser His Ser Asp Tvr 
65 70 75 80 

Leu Leu Thr He Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Ara 
85 90 95 

He Glu Tyr Gin Asp Gly Lys Phe Arg Leu Asp Ser He He Cys Val 
100 105 110 

Lys Ser Lys Leu Lys Gin Phe Asp Ser Val Val His Leu He Aso Tvr 
115 120 125 

Tyr Val Gin Met Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Ara 
130 135 140 

Asn Gly Thr Val His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala 

150 155 160 

Pro Thr Leu Gin His Phe Cys Arg Leu Ala He Asn Lys Cys Thr Gly 
165 170 175 

Thr He Trp Gly Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu 
180 185 190 

Glu Tyr Lys Phe Gin Val 
195 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2187 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



1121 
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(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 18.. 695 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

CGCTGGCTCC GTGCGCC ATG GTC ACC CAC AGC AAG TTT CCC GCC GCC GGG 50 
Met Val Thr His Ser Lys Phe Pro Ala Ala Gly 
15 10 

ATG AGC CGC CCC CTG GAC ACC AGC CTG CGC CTC AAG ACC TTC AGC TCC 98 
Met Ser Arg Pro Leu Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser 
15 20 25 

AAA AGC GAG TAG CAG CTG GTG GTG AAC GCC GTG CGC AAG CTG CAG GAG 14 6 

Lys Ser Glu Tyr Gin Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu 
30 35 40 

AGC GGA TTC TAC TGG AGC GCC GTG ACC GGC GGC GAG GCG AAC CTG CTG 194 
Ser Gly Phe Tyr Trp Ser Ala Val Thr Gly Gly Glu Ala Asn Leu Leu 
45 50 55 

CTC AGC GCC GAG CCC GCG GGC ACC TTT CTT ATC CGC GAC AGC TCG GAC 2 42 

Leu Ser Ala Glu Pro Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp 
60 65 70 75 

CAG CGC CAC TTC TTC ACG TTG AGC GTC AAG ACC CAG TCG GGG ACC AAG 290 
Gin Arg His Phe Phe Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys 
80 85 90 

AAC CTA CGC ATC CAG TGT GAG GGG GGC AGC TTT TCG CTG CAG AGT GAC 33 8 

Asn Leu Arg lie Gin Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp 
95 100 105 

CCC CGA AGC ACG CAG CCA GTT CCC CGC TTC GAC TGT GTA CTC AAG CTG 3 86 

Pro Arg Ser Thr Gin Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu 
110 115 120 

GTG CAC CAC TAC ATG CCG CCT CCA GGG ACC CCC TCC TTT TCT TTG CCA 434 
Val His His Tyr Met Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro 
125 130 135 

CCC ACG GAA CCC TCG TCC GAA GTT CCG GAG CAG CCA CCT GCC CAG GCA 482 
Pro Thr Glu Pro Ser Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala 
140 145 150 155 

CTC CCC GGG AGT ACC CCC AAG AGA GCT TAC TAC ATC TAT TCT GGG GGC 53 0 

Leu Pro Gly Ser Thr Pro Lys Arg Ala Tyr Tyr lie Tyr Ser Gly Gly 
160 165 170 

GAG AAG ATT CCG CTG GTA CTG AGC CGA CCT CTC TCC TCC AAC GTG GCC 578 
Glu Lys lie Pro Leu Val Leu Ser Arg Pro Leu Ser Ser Asn Val Ala 
175 180 185 

ACC CTC CAG CAT CTT TGT CGG AAG ACT GTC AAC GGC CAC CTG GAC TCC 626 
Thr Leu Gin His Leu Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser 
190 195 200 

TAT GAG AAA GTG ACC CAG CTG CCT GGA CCC ATT CGG GAG TTC CTG GAT 674 
Tyr Glu Lys Val Thr Gin Leu Pro Gly Pro lie Arg Glu Phe Leu Asp 
205 210 215 

CAG TAT GAT GCT CCA CTT TAAGGAGCAA AAGGGTCAGA GGGGGGCCTG 72 2 

Gin Tyr Asp Ala Pro Leu 
220 225 
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GGTCGGTCGG TCGCCTCTCC TCCGAGGCAC ATGGCACAAG CACAAAAATC CAGCCCCAAC 782 

GGTCGGTAGC TCCCAGTGAG CCAGGGGCAG ATTGGCTTCT TCCTCAGGCC CTCCACTCCC 842 

GCAGAGTAGA GCTGGCAGGA CCTGGAATTC GTCTGAGGGG AGGGGGAGCT GCCACCTGCT 902 

TTCCCCCCTC CCCCAGCTCC AGCTTCTTTC AAGTGGAGCC AGCCGGCCTG GCCTGGTGGG 962 

ACAATACCTT TGACAAGCGG ACTCTCCCCT CCCCTTCCTC CACACCCCCT CTGCTTCCCA 102 2 

AGGGAGGTGG GGACACCTCC AAGTGTTGAA CTTAGAACTG CAAGGGGAAT CTTCAAACTT 1082 

TCCCGCTGGA ACTTGTTTGC GCTTTGATTT GGTTTGATCA AGAGCAGGCA CCTGGGGGAA 1142 

GGATGGAAGA GAAAAGGGTG TGTGAAGGGT TTTTATGCTG GCCAAAGAAA TAACCACTCC 1202 

CACTGCCCAA CCTAGGTGAG GAGTGGTGGC TCCTGGCTCT GGGGAGAGTG GCAAGGGGTG 1262 

ACCTGAAGAG AGCTATACTG GTGCCAGGCT CCTCTCCATG GGGCAGCTAA TGAAACCTCG 1322 

CAGATCCCTT GCACCCCAGA ACCCTCCCCG TTGTGAAGAG GCAGTAGCAT TTAGAAGGGA 13 82 

GACAGATGAG GCTGGTGAGC TGGCCGCCTT TTCCAACACC GAAGGGAGGC AGATCAACAG 1442 

ATGAGCCATC TTGGAGCCCA GGTTTCCCCT GGAGCAGATG GAGGGTTCTG CTTTGTCTCT 1502 

CCTATGTGGG GCTAGGAGAC TCGCCTTAAA TGCCCTCTGT CCCAGGGATG GGGATTGGCA 15 62 

CACAAGGAGC CAAACACAGC CAATAGGCAG AGAGTTGAGG GATTCACCCA GGTGGCTACA 1622 

GGCCAGGGGA AGTGGCTGCA GGGGAGAGAC CCAGTCACTC CAGGAGACTC CTGAGTTAAC 1682 

ACTGGGAAGA CATTGGCCAG TCCTAGTCAT CTCTCGGTCA GTAGGTCCGA GAGCTTCCAG 1742 

GCCCTGCACA GCCCTCCTTT CTCACCTGGG GGGAGGCAGG AGGTGATGGA GAAGCCTTCC 1802 

CATGCCGCTC ACAGGGGCCT CACGGGAATG CAGCAGCCAT GCAATTACCT GGAACTGGTC 1862 

CTGTGTTGGG GAGAAACAAG TTTTCTGAAG TCAGGTATGG GGCTGGGTGG GGCAGCTGTG 1922 

TGTTGGGGTG GCTTTTTTCT CTCTGTTTTG AATAATGTTT ACAATTTGCC TCAATCACTT 1982 

TTATAAAAAT CCACCTCCAG CCCGCCCCTC TCCCCACTCA GGCCTTCGAG GCTGTCTGAA 2042 

GATGCTTGAA AAACTCAACC AAATCCCAGT TCAACTCAGA CTTTGCACAT ATATTTATAT 2102 

TTATACTCAG AAAAGAAACA TTTCAGTAAT TTATAATAAA AGAGCACTAT TTTTTAATGA 2162 
AAAAAAAAAA AAAAAAAAAA AAAAA 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 5 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Val Thr His Ser Lys Phe Pro Ala Ala Gly Met Ser Arg Pro Leu 
1 5 10 15 

Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser Lys Ser Glu Tyr Gin 
20 25 30 
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Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu Ser Gly Phe Tyr Trp 
35 40 45 

Ser Ala Val Thr Gly Gly Glu Ala Asn Leu Leu Leu Ser Ala Glu Pro 
50 55 60 

Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp Gin Arg His Phe Phe 
65 70 75 80 

Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys Asn Leu Arg lie Gin 
85 90 95 

Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp Pro Arg Ser Thr Gin 
100 105 110 

Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu Val His His Tyr Met 
115 120 125 

Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro Pro Thr Glu Pro Ser 
130 135 140 

Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala Leu Pro Gly Ser Thr 
145 150 155 160 

Pro Lys Arg Ala Tyr Tyr He Tyr Ser Gly Gly Glu Lys He Pro Leu 
165 170 175 

Val Leu Ser Arg Pro Leu Ser Ser Asn Val Ala Thr Leu Gin His Leu 
180 185 190 

Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser Tyr Glu Lys Val Thr 
195 200 205 

Gin Leu Pro Gly Pro He Arg Glu Phe Leu Asp Gin Tyr Asp Ala Pro 
210 215 220 

Leu 
225 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1094 base pairs 
{B) TYPE: nucleic acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 



CTCCGGCTGG CCCCTTCTGT AGGATGGTAG CACACAACCA GGTGGCAGCC GACAATGCAG 60 

TCTCCACAGC AGCAGAGCCC CGACGGCGGC CAGAACCTTC CTCCTCTTCC TCCTCCTCGC 12 0 

CCGCGGCCCC CGCGCGCCCG CGGCCGTGCC CCGCGGTCCC GGCCCCGGCC CCCGGCGACA 180 

CGCACTTCCG CACATTCCGT TCGCACGCCG ATTACCGGCG CATCACGCGC GCCAGCGCGC 240 

TCCTGGACGC CTGCGGATTC TACTGGGGGC CCCTGAGCGT GCACGGGGCG CACGAGCGGC 3 00 

TGCGCGCCGA GCCCGTGGGC ACCTTCCTGG TGCGCGACAG CCGCCAGCGG AACTGCTTTT 3 50 

TCGCCCTTAG CGTGAAGATG GCCTCGGGAC CCACGAGCAT CCGCGTGCAC TTTCAGGCCG 420 

GCCGCTTTCA CCTGGATGGC AGCCGCGAGA GCTTCGACTG CCTCTTCGAG CTGCTGGAGC 4 80 
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ACTACGTGGC GGCGCCGCGC CGCATGCTGG GGGCCCCGCT GCGCCAGCGC CGCGTGCGGC 540 

CGCTGCAGGA GCTGTGCCGC CAGCGCATCG TGGCCACCGT GGGCCGCGAG AACCTGGCTC 600 

GCATCCCCCT CAACCCCGTC CTCCGCGACT ACCTGAGCTC CTTCCCCTTC CAGATTTGAC 660 

CGGCAGCGCC CGCCGTGCAC GCAGCATTAA CTGGGATGCC GTGTTATTTT GTTATTACTT 72 0 

GCCTGGAACC ATGTGGGTAC CCTCCCCGGC CTGGGTTGGA GGGAGCGGAT GGGTGTAGGG 780 

GCGAGGCGCC TCCCGCCCTC GGCTGGAGAC GAGGCCGCAG ACCCCTTCTC ACCTCTTGAG 84 0 

GGGGTCCTCC CCCTCCTGGT GCTCCCTCTG GGTCCCCCTG GTTGTTGTAG CAGCTTAACT 90 0 

GTATCTGGAG CCAGGACCTG AACTCGCACC TCCTACCTCT TCATGTTTAC ATATACCCAG 960 

TATCTTTGCA CAAACCAGGG GTTGGGGGAG GGTCTCTGGC TTTATTTTTC TGCTGTGCAG 102 0 

AATCCTATTT TATATTTTTT AAAGTCAGTT TAGGTAATAA ACTTTATTAT GAAAGTTTTT 1080 

TTTTTTAAAA AAAA 2094 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Val Ala His Asn Gin Val Ala Ala Asp Asn Ala Val Ser Thr Ala 
15 10 15 

Ala Glu Pro Arg Arg Arg Pro Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Pro Ala Ala Pro Ala Arg Pro Arg Pro Cys Pro Ala Val Pro Ala Pro 
35 40 45 

Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ala Asp Tyr 
50 55 60 

Arg Arg lie Thr Arg Ala Ser Ala Leu Leu Asp Ala Cys Gly phe Tyr 
65 70 75 80 

Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala Glu 
85 90 95 
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Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys Phe 
100 105 110 

Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg Val 
115 120 125 

His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Ser Phe 
130 135 140 

Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg Arg 
145 150 155 160 

Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin Glu 
165 170 175 

Leu Cys Arg Gin Arg lie Val Ala Thr Val Gly Arg Glu Asn Leu Ala 
180 185 190 

Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe Pro 
195 200 205 

Phe Gin lie 
210 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2807 base pairs 

(B) TYPE: nucleic acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 11 : 



GGAAACCGAG 


GCGGGGAGAC 


CAGGAGGCCT 


TGGCCTCAGA 


GCTTCAGAGT 


CGCGTGGCAG 


60 


CAAACAGAGA 


AACCTGTAGA 


GGGCAGTGTG 


CGTCACTTAG 


CTCAGGGAAG 


CTGCACGCGA 


120 


AACTCACCCG 


CCTTCATTCA 


TAAACATCGT 


CAGCTAGGCA 


CCTACTCCTG 


GGCTTTCAGG 


180 


ACAAACTGAA 


TCACGAAACC 


ACAGTGTCCT 


TAAAATAGGT 


CTGACCGCCT 


GAATCCCTGG 


240 


CCAAGGTGTG 


TACGGGGCAT 


GGGAGCCCTT 


GTGCAGAGAT 


GCTTGCAGGA 


GCCTTGAGGG 


300 


GCTCTGTAAG 


ACAGAGGCTA 


GGAAGACAAA 


GTTGGGGGCT 


ACAGCTTCTT 


GTCCTGCCCG 


360 


GGGCCTCAGT 


TTCTTCGGTT 


GCCCACGTAG 


GAGTGCAGAG 


AGTCCAGCCC 


CTGGGGACCC 


420 


AACCCAACCC 


CGCCCAGTTT 


CCGAGGAACT 


CGTCCGGGAG 


CGGGGGCGCC 


CCTCCCGCAC 


480 


CGCCTTAGGC 


TTCCTTTGAA 


GCCTCTGCGG 


TCAGGCCACC 


GCTTCCTGGG 


AAGCCCAAGC 


540 
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CAAGGCCAGG 


CCGAGTGGCC 


AACGGGAGGG 


GCCCGCGCGC 


GATTCTGRAr; 


R A PPPO/^/~'/-'r' 


b U U 


GCCCCACAGG 


TCTCCAGGGC 


TGGCTAGCCG 


GGCTCCTAGA 


GCGGAGAPTR 


pp ^ ^cr^r^r^fT^rn 


c c c\ 
boU 


CGGGTCCTGG 


GCAGGAAGGA 


TCCTGGCAGG 


GAGGAGTTGC 


TTGGGGRGTG 




o o n 


GCTCCAGGCG 


CGGTGGAGCT 


CTGACCAGGA 


GAATGCACAC 


APTPPPAPPP 




780 


GTCAGCCCCA 


AGCTAGCATC 


CCACCCGGGG 


AGCAGCGATG 


TGGGGPG A AP 


i ALjC t, AG AG 


840 


CAAAAGAGCA 


GGCACCAGGT 


GACACGAAAC 


AGAAGATTCC 


PPPTAP A Por* 


AGAACCCCAG 


900 


AAGTCCCATT 


CAGGGAAGGT 


GCGAGGCGAG 


AACGAGTTAn 




TC C AGGGGC A 


960 


GCCAAAGAAA 


TCTAAAGAGA 


ACCCGAAGGA 




AP 2V(^ A 7i TV f~*r~^r^ 


AAAGCGGCGG 


1020 


TGGGCGGGAT 


CGGTGGGCGG 


GGCCTCCCTG 


nTTTAAHA^^r' 




GGCGGGCAGC 


1080 


AGCAGAGAGA 


ACTGCGGCCG 


TGGCAGCGGC 


VJ VJVw. X v., V_ V \j 


v„ l— ACjC 


ATGCGCGACA 


1140 


GCAGCCCCGG 


AACCCCCAGC 


CQCGGCGCCC 




L-oULACsUTGA 


GCCGAGGCAG 


1200 


CTGCGAAGGA 


GCAGGCGGGA 


GGGGATGRGA 




*-.ACjA(jLC TGG 


CAGGACTATC 


1260 


CTCGCAGACT 


GCATGGCGGG 


GTCGTGGATG 


P T A TTIP P TT' T" 
V- 1 rt. X \„ X \— 1 




CCACCGGCTG 


1320 


GCCCAGGCGG 


CCCCTCGCGC 


GCGCGRGGPG 




t_ i. UL. 1 L. rccG 


GCCCTGAGCC 


1380 


CGGATCGTCC 


GCCCGGGTTC 


CAGTTPrrGG 

V— J^>-3 J. i v_. vjv_3 


Pf^T'PrT'P APT 


TV r^r^/^/^^ry iv i\ <— < 
AtjCjCCaGCAAC 


CGCGAGGCGG 


1440 


CAAGCCACCC 


AGCGGGGACG 


GCCTGGAGTC 


Pft:p P P P P TP 


f~*(^ ^ 
^ V- AL. o U L. L. v- (- 


TTCTCCACGC 


1500 


GCGCGGGGAG 


GCAGGGCTCC 


ACCGCCARTP 


TPP A A PP TTi 


A 1 AC AG 


GAACGGCCTA 


1560 


CTTCGCAGAT 


GAGCCCACCG 


AGGCTCARGP 




rn rn /"I rri ^ ^ m ^ rn 


CACCCTCGCT 


1620 


CCTTGGGGTC 


CGCTGGCCGG 


CCTGTGCCAC 


PPf^PAPf3pr'P 


0(^TTi^ A /^rn/~<(*i 
X X AL. 1 


CTCTGTCTCC 


1680 


CCCATCAGCG 


CAGCCCCGGA 


CGCTATGGCC 


CACCCC^CC A 

VwN^ X \w V. 




CGAGTAGGAT 


1740 


GGTAGCACGT 


AACCAGGTGG 


AAGCCGAPAA 


Tn PP A "PP TP P 


Uv,oLiL.A i LAG 


AGCCCCGACG 


1800 


GCGGCCAGAG 


CCATCCTCGT 


CCTCGTCTTC 


\J X X vJv— Vj> 




CGCGTCCCCG 


1860 


GCCCTGCCCG 


GTGGTCCCGG 


CCCCGGCTCC 


fifinPPAP APT 


^-A\-i IULCjLA 


CCTTCCGCTC 


1920 


CCACTCTGAT 


TACCGGCGCA 


TCACGCGGAC 


C AGPPP TPTP 


p m /-I * o o /-I rp 


GCGGCTTCTA 


1980 


CTGGGGACCC 


CTGAGCGTGC 


ATGGGGCGCA 


PPAAPPPPTP 


1 iV-V-VjAA\- 


CCGTGGGCAC 


2040 


CTTCTTGGTG 


CGCGACAGTC 


GC C AGC GG AA 


PTnPTTPTTP 

^ X vJTv X X X X 


vjUoU It-AvaCG 


TGAAGATGGC 


2100 


TTCGGGCCCC 


ACGAGCATTC 


GTGTGCACTT 




PP/^TTO/^ A r^/^ 


TGGACGGCAA 


2160 


CCGCGAGACC 


TTCGACTGCC 


TCTTCGAGCT 


GCTGGAGPAP 


TAPPTPPP/^O 


CGCCGCGCCG 


222 0 


CATGTTGGGG 


GCCCCACTGC 


GCCAGCGCCG 




U I (ot-AtjtaAGL 


TGTGTCGCCA 


2280 


GCGCATCGTG 


GCCGCCGTGG 


GTCGCGAGAA 




A TO/^OT/^mrn a 
A1UUL.H_ i 1 A 


ACCCGGTACT 


2340 


CCGTGACTAC 


CTGAGTTCCT 


TCCCCTTCCA 


GATCTGACCG 


X V- Vj 


i ov_ C_CGC_ AGA 


2400 


ATTAAGTGGG 


AGCGCCTTAT 


TATTTCTTAT 


TATTAATTAT 


TATTATTTTT 


CTGGAACCAC 


2460 


GTGGGAGCCC 


TCCCCGCCTA 


GGTCGGAGGG 


AGTGGGTGTG 


GAGGGTGAGA 


TCCCTCCCAC 


2520 


TTCTGGCTGG 


AGACCTTATC 


CCGCCTCTCG 


GGGGGCCTCC 


CCTCCTGGTG 


CTCCCTCCCG 


2580 


GTCCCCCTGG 


TTGTAGCAGC 


TTGTGTCTGG 


GGCCAGGACC 


TGAACTCCAC 


GCCTACCTCT 


2640 


CCATGTTTAC 


ATGTTCCCAG 


TATCTTTGCA 


CAAACCAGGG 


GTGGGGGAGG 


GTCTCTGGCT 


2700 


TCATTTTTCT 


GCTGTGCAGA 


ATATTCTATT 


TTATATTTTT 


ACATCCAGTT 


TAGATAATAA 


2760 


ACTTTATTAT 


GAAAGTTTTT 


TTTTTTAAAG 


AAACAAAGAT 


TTCTAGA 




2807 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Val Ala Arg Asn Gin Val Glu Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

Ser Glu Pro Arg Arg Arg Pro Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Ala Arg Pro Arg Pro Cys Pro Val Val Pro Ala 
35 40 45 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ser 
85 90 95 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Asn Arg Glu Thr 
130 135 140 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 
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Ala Arg He Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 

Pro Phe Gin He 
210 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1611 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 263.. 1529 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CGAATTCCGG GCGGGCTGTG TGAGTCTGTG AGTGGAAGGC GCGCCGGCTC TTTTGTCTGA 60 

GTGTGACCCG GTGGCTTTGT TCCAGGCATT CCGGTGATTT CCTCCGGGCA GTCCGCAGAA 120 

GCCGCAGCGG CCGCCCGCGC TCTCTCTGCA GTCTCCACAC CCGGGAGAGC CTGAGCCCGC 180 

GTCACGCCCC TCAGCCCCCG CTGAGTCCCT TCTCTGTTGT CGCGTCCGAA TCGAGTTCCC 240 

GGAATCAGAC GGTGCCCCAT AG ATG GCC AGC TTT CCC CCG AGG GTT AAC GAG 292 

Met Ala Ser Phe Pro Pro Arg Val Asn Glu 
15 10 

AAA GAG ATC GTG AGA TCA CGT ACT ATA GGG GAA CTC TTG GCT CCA GCA 340 
Lys Glu He Val Arg Ser Arg Thr He Gly Glu Leu Leu Ala Pro Ala 
15 20 25 

GCT CCT TTT GAC AAG AAA TGT GGT GGT GAG AAC TGG ACG GTT GCT TTT 388 
Ala Pro Phe Asp Lys Lys Cys Gly Gly Glu Asn Trp Thr Val Ala Phe 
30 35 40 
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GCT CCT GAT 
Ala Pro Asp 
45 

AAG CTT GTC 
Lys Leu Val 
60 

TCC AAA AAT 
Ser Lys Asn 
75 

AGT AAT GOT 
Ser Asn Gly 

GGA GAC ATA 
Gly Asp lie 

CAG AGT CGT 
Gin Ser Arg 
125 

GAT CAG CTA 
Asp Gin Leu 
140 

TGG GAT GTA 
Trp Asp Val 
155 

GAA ATG GTT 
Glu Met Val 



GTA TCA GCT 
Val Ser Ala 



GAT GGA AAC 
Asp Gly Asn 
205 

AGT TGT GCA 



GGT TCC TAC 
Gly Ser Tyr 

CCG TGG TCC 
Pro Trp Ser 

GTT ACC AAT 
Val Thr Asn 
80 

GGT CAG AAA 
Gly Gin Lys 
95 

GTC TGG AGT 
Val Trp Ser 
110 

TGC GTT AAT 
Cys Val Asn 

CTC CTT GCC 
Leu Leu Ala 



TAT ACA GGA 
Tyr Thr Gly 
160 

AGA GAT TTA 
Arg Asp Leu 
175 

TCA AGA GAC 
Ser Arg Asp 
190 

ATG GTG AAA 
Met Val Lys 

TTC TCT CCC 



TTT GCG TGG 
Phe Ala Trp 
50 

CAG TGC CGT 
Gin Cys Arg 
65 

TCA AGC TGT 
Ser Ser Cys 

AAC AAG CCT 
Asn Lys Pro 

CTT GCT TTT 
Leu Ala Phe 
115 

ATA GAA TGG 
lie Glu Trp 
130 

ACA GGA TTA 
Thr Gly Leu 
145 

AAA CTC CTC 
Lys Leu Leu 

ACT TTT GCT 
Thr Phe Ala 



AAA ACT CTA 
Lys Thr Leu 
195 

GTA TTG CGG 
Val Leu Arg 
210 

GAC TGT TCT 



TCA CAA GGA 
Ser Gin Gly 

AAG AAC TTT 
Lys Asn Phe 
70 

CTA AAA TTG 
Leu Lys Leu 
85 

CCT GAG CAC 
Pro Glu His 
100 

GGG TCT TCA 
Gly Ser Ser 

CAT CGG TTC 
His Arg Phe 

AAC AAT GGT 
Asn Asn Gly 
150 

CTT AAT TTG 
Leu Asn Leu 
165 

CCA GAT GGG 
Pro Asp Gly 
180 

AGA GTG TGG 
Arg Val Trp 

GCA CAT CAG 
Ala His Gin 



ATG CTG TGT 



TAT CGC ATA 
Tyr Arg lie 
55 

CTT TTG CAT 
Leu Leu His 



GCA AGA CAA 
Ala Arg Gin 

GTT ATA GAC 
Val lie Asp 
105 

GTT CCA GAA 
Val Pro Glu 
120 

CGA TTT GGA 
Arg Phe Gly 
135 

CGC ATC AAA 
Arg lie Lys 

GTA GAC CAC 
Val Asp His 

AGC TTA CTC 
Ser Leu Leu 
185 

GAC CTG AAA 
Asp Leu Lys 
200 

AAT TGG GTG 
Asn Trp Val 
215 

TCA GTG GGC 



GTG 436 
Val 



GGT 4 84 

Gly 

AAC 532 
Asn 
90 

TGT 5 80 

Cys 

AAA 628 
Lys 

CAG 676 
Gin 



ATC 724 
He 



ATT 7 72 

lie 

170 

CTT 820 
Leu 



GAT 868 
Asp 

TAC 916 
Tyr 



GCC 9 64 
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Ser Cys Ala Phe Ser Pro Asp Cys Ser Met Leu Cys Ser Val Gly Ala 
220 225 230 

AGT AAA GCA GTT TTC CTT TGG AAT ATG GAT AAA TAG ACC ATG ATT AGG 1012 
Ser Lys Ala Val Phe Leu Trp Asn Met Asp Lys Tyr Thr Met lie Arg 
235 240 245 250 

AAG CTG GAA GGT CAT CAC CAT GAT GTT GTA GCT TGT GAC TTT TCT CCT 1060 
Lys Leu Glu Gly His His His Asp Val Val Ala Cys Asp Phe Ser Pro 
255 260 255 

GAT GGA GCA TTG CTA GCT ACT GCA TCC TAT GAC ACT CGT GTG TAT GTC 1108 
Asp Gly Ala Leu Leu Ala Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val 
270 275 280 

TGG GAT CCA CAC AAT GGA GAC CTT CTG ATG GAG TTT GGG CAC CTG TTT 1156 
Trp Asp Pro His Asn Gly Asp Leu Leu Met Glu Phe Gly His Leu Phe 
285 290 295 

CCC TCG CCC ACT CCA ATA TTT GCT GGA GGA GCA AAT GAC CGA TGG GTG 12 04 

Pro Ser Pro Thr Pro lie Phe Ala Gly Gly Ala Asn Asp Arg Trp Val 
300 305 310 

AGA GCT GTG TCT TTC AGT CAT GAT GGA CTG CAT GTT GCC AGO CTT GCT 1252 
Arg Ala Val Ser Phe Ser His Asp Gly Leu His Val Ala Ser Leu Ala 
315 320 325 330 

GAT GAT AAA ATG GTG AGG TTC TGG AGA ATC GAT GAG GAT TGT CCG GTA 13 00 

Asp Asp Lys Met Val Arg Phe Trp Arg lie Asp Glu Asp Cys Pro Val 
335 340 345 

CAA GTT GCA CCT TTG AGC AAT GGT CTT TGC TGT GCC TTT TCT ACT GAT 134 8 

Gin Val Ala Pro Leu Ser Asn Gly Leu Cys Cys Ala Phe Ser Thr Asp 
350 355 360 

GGC AGT GTT TTA GCT GCT GGG ACA CAT GAT GGA AGT GTG TAT TTT TGG 1396 
Gly Ser Val Leu Ala Ala Gly Thr His Asp Gly Ser Val Tyr Phe Trp 
365 370 375 

GCC ACT CCA AGG CAA GTC CCT AGC CTT CAA CAT ATA TGT CGC ATG TCA 1444 
Ala Thr Pro Arg Gin Val Pro Ser Leu Gin His lie Cys Arg Met Ser 
380 385 390 

ATC CGA AGA GTG ATG TCC ACC CAA GAA GTC CAA AAA CTG CCT GTT CCT 1492 
He Arg Arg Val Met Ser Thr Gin Glu Val Gin Lys Leu Pro Val Pro 
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395 400 405 410 

TCC AAA ATA TTG GCG TTT CTC TCC TAG CGC GGT TAG A CTGAAGACTG 1539 
Ser Lys lie Leu Ala Phe Leu Ser Tyr Arg Gly * 
415 420 

CCTTTCCTGG TAGGCCTGCC AGACAGAGCG CCCTTTACAA GACACACCTC AAGCTTTACC 1599 

TCGTGCCGAA TT 1611 



(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

Met Ala Ser Phe Pro Pro Arg Val Asn Glu Lys Glu lie Val Arg Ser 
1.5 10 15 

Arg Thr lie Gly Glu Leu Leu Ala Pro Ala Ala Pro Phe Asp Lys Lys 
20 25 30 

Cys Gly Gly Glu Asn Trp Thr Val Ala Phe Ala Pro Asp Gly Ser Tyr 
35 40 45 

Phe Ala Trp Ser Gin Gly Tyr Arg lie Val Lys Leu Val Pro Trp Ser 
50 55 60 

Gin Cys Arg Lys Asn Phe Leu Leu His Gly Ser Lys Asn Val Thr Asn 
65 70 75 80 

Ser Ser Cys Leu Lys Leu Ala Arg Gin Asn Ser Asn Gly Gly Gin Lys 
85 90 95 

Asn Lys Pro Pro Glu His Val lie Asp Cys Gly Asp lie Val Trp Ser 
100 105 110 

Leu Ala Phe Gly Ser Ser Val Pro Glu Lys Gin Ser Arg Cys Val Asn 
115 120 125 
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lle Glu Trp His Arg Phe Arg Phe Gly Gin Asp Gin Leu Leu Leu Ala 
130 135 140 

Thr Gly Leu Asn Asn Gly Arg lie Lys He Trp Asp Val Tyr Thr Gly 
145 150 155 160 

Lys Leu Leu Leu Asn Leu Val Asp His He Glu Met Val Arg Asp Leu 
165 170 175 

Thr Phe Ala Pro Asp Gly Ser Leu Leu Leu Val Ser Ala Ser Arg Asp 
180 185 190 

Lys Thr Leu Arg Val Trp Asp Leu Lys Asp Asp Gly Asn Met Val Lys 
195 200 205 

Val Leu Arg Ala His Gin Asn Trp Val Tyr Ser Cys Ala Phe Ser Pro 
210 215 220 

Asp Cys Ser Met Leu Cys Ser Val Gly Ala Ser Lys Ala Val Phe Leu 
225 230 235 240 

Trp Asn Met Asp Lys Tyr Thr Met He Arg Lys Leu Glu Gly His His 
245 250 255 

His Asp Val Val Ala Cys Asp Phe Ser Pro Asp Gly Ala Leu Leu Ala 
260 265 270 

Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val Trp Asp Pro His Asn Gly 
275 280 285 

Asp Leu Leu Met Glu Phe Gly His Leu Phe Pro Ser Pro Thr Pro He 
290 295 300 

Phe Ala Gly Gly Ala Asn Asp Arg Trp Val Arg Ala Val Ser Phe Ser 
305 310 315 320 

His Asp Gly Leu His Val Ala Ser Leu Ala Asp Asp Lys Met Val Arg 
325 330 335 

Phe Trp Arg He Asp Glu Asp Cys Pro Val Gin Val Ala Pro Leu Ser 
340 345 350 

Asn Gly Leu Cys Cys Ala Phe Ser Thr Asp Gly Ser Val Leu Ala Ala 
355 360 365 
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Gly Thr His Asp Gly Ser Val Tyr Phe Trp Ala Thr Pro Arg Gin Val 
370 375 380 

Pro Ser Leu Gin His He Cys Arg Met Ser He Arg Arg Val Met Ser 
385 390 395 400 

Thr Gin Glu Val Gin Lys Leu Pro Val Pro Ser Lys He Leu Ala Phe 
405 410 415 

Leu Ser Tyr Arg Gly * 
420 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 783 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

CTGTCTTCCT CCGCAGCGCG AGGCTGGGTA CAGGGTCTAT TGTCTGTGGT TGACTCCGTA 60 

CTTTGGTCTG AGGCCTTCGG GAGCTTTCCC GAGGCAGTTA GCAGAAGCCG CAGCGACCGC 120 

CCCCGCCCGT CTCCTCTGTC CCTGGGCCCG GGAGACAAAC TTGGCGTCAC GCCCTCAGCG 180 

GTCGCCACTC TCTTCTCTGT TGTTGGGTCC GCATCGTATT CCCGGAATCA GACGGTGCCC 24 0 

CATAGATGGC CAGCTTTCCC CCGAGGGTCA ACGAGAAAGA GATCGTGAGA TCACGTACTA 300 

TAGGTGAACT TTTAGCTCCT GCAGCTCCTT TTGACAAGAA ATGTGGTCGT GAAAATTGGA 360 

CTGTTGCTTT TGCTCCAGAT GGTTCATACT TTGCTTGGTC ACAAGGACAT CGCACAGTAA 42 0 

AGCTTGTTCC GTGGTCCCAG TGCCTTCAGA ACTTTCTCTT GCATGGCACC AAGAATGTTA 480 

CCAATTCAAG CAGTTTAAGA TTGCCAAGAC AAAATAGTGA TGGTGGTCAG AAAAATAAGC 54 0 

CTCGTGACAT ATTATAGACT GTGGAGATAT AGTCTGGAGT CTTGCTTTTG GGTCATCAGT 600 
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TCCAGAAAAA CAGAGTCGCT GTGTAAATAT AGAATGGCAT CGCTTCAGAT TTGGACAAGA 660 

TCAGCTACTT CTTGCTACAG GGTTGAACAA TGGGCGTATC AAAATATGGG ATGTATATCA 7 20 

GGAAACTCCT CCTTAACTTG GTAGATCATA CTGAAGTGGT CAGAGATTTA ACTTTTGCTC 7 80 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CTCTGTATGT CTGAATGAAG CTATAACATT TGCCTTTTTA TTGCAGGTTT TCCTTTGGAA 60 

TATGGATAAA TACACCATGA TACGGAAACT AGAAGGACAT CACCATGATG TGGTAGCTTG 12 0 

TGACTTTTCT CCTGATGGAG CATTACTGGC TACTGCATCT TATGATACTC GAGTATATAT 18 0 

CTGGGATCCA CATAATGGAG ACATTCTGAT GGAATTTGGG CACCTGTTTC CCCCACCTAC 240 

TCCAATATTT GCTGGAGGAG CAAATGACCG GTGGGTACGA TCTGTATCTT TTAGCCATGA 300 

TGGACTGCAT GTTGCAAGCC TTGCTGATGA TAAAATGGTG AGGTTCTGGA GAATTGATGA 360 

GGATTATCCA GTGCAAGTTG CACCTTTGAG CAATGGTCTT TGCTGTGCCT TCTCTACTGA 42 0 

TGGCAGTGTT TTAGCTGCTG GGACACATGA CGGAAGTGTG TATTTTTGGG CCACTCCACG 480 

GCAGGTCCCT AGCCTGCAAC ATTTATGTCG CATGTCAATC CGAAGAGTGA TGCCCACCCA 54 0 

AGAAGTTCAG GAGCTGCCGA TTCCTTCCAA GCTTTTGGAG TTTCTCTCGT ATCGTATTTA 600 

GAAGATTCTG CCTTCCCTAG TAGTAGGGAC TGACAGAATA CACTTAACAC AAACCTCAAG 660 

CTTTACTGAC TTCAATTATC TGTTTTTAAA GACGTAGAAG ATTTATTTAA TTTGATATGT 72 0 
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TCTTGTACTG CATTTTGATC AGTTGAGCTT TTAAAATATT ATTTATAGAC AATAGAAGTA 



780 



TTTCTGAACA TATCAAATAT AAATTTTTTT AAAGATCTAA CTGTGAAAAC ATACATACCT 



840 



GTACATATTT AGATATAAGC TGCTATATGT TGAATGGACC CTTTTGCTTT TCTGATTTTT 



900 



AGTTCTGACA TGTATATATT GCTTCAGTAG AGCCACAATA TGTATCTTTG CTGTAAAGTG 



960 



CAAGGAAATT TTAAATTCTG GGACACTGAG TTAGATGGTA AATACTGACT TACGAAAGTT 



1020 



GAATTGGGTG AGGCGGGCAA ATCACCTGAG GTCAGCAGTT TGAGACTAGC CTGGCAAACA 



1080 



TGATGAAACC CTGTCTCTAC TAAAAATACA AAAAAAAAAA AA 



1122 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 422.. 2029 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 17 : 

CGGCACGAGC CGGGCTCCGT CCGGAGGAAG CGAGGCTGCG CCGCCGGCCC GGCAGGAGCG 60 

GAGGACGGGA GCGCGGGCGG TCGCGCTCGC CCTGTCGCTG ACTGCGCTGC CCCGGCCCAT 120 

CCTTGCCTGG CCGCAGGTGC CCTGGATGAG GCCGCCGCGC GTGTCCCGGC CGCTGAGTGT 180 

CCCCCGCGGT CGCCCGGCGC CTGCCCTCAA GCGGCCGCCT CTCCTTGCCC GGGTCCCCGT 240 

TTTCCCCCGG CGCAGTCCTC CTCCGGTGGG CGCCTCCGCA CCTCGGCGCA GGCGGCACGG 3 00 

CCCTCGGGCC GGGATGGATC CGCCGGGAAG AGGAAGACAA GCCGGGGCGT TGAGCCCCTG 3 60 
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CGCACGGTGC CGCCGCGCGT AGTGGGAGCT TACTCGCAGT AGGCTCTCGC TCTTCTAATC 42 0 

A ATG GAT AAA GTG GGG AAA ATG TGG AAC AAC TTA AAA TAC AGA TGC 466 
Met Asp Lys Val Gly Lys Met Trp Asn Asn Leu Lys Tyr Arg Cys 
15 10 15 

GAG AAT CTC TTC AGC CAC GAG GGA GGA AGO CGT AAT GAG AAC GTG GAG 514 
Gin Asn Leu Phe Ser His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu 
20 25 30 

ATG AAC CCC AAC AGA TGT CCG TCT GTC AAA GAG AAA AGC ATC AGT CTG 562 
Met Asn Pro Asn Arg Cys Pro Ser Val Lys Glu Lys Ser lie Ser Leu 
35 40 45 

GGA GAG GCA GCT CCC CAG CAA GAG AGC AGT CCC TTA AGA GAA AAT GTT 610 
Gly Glu Ala Ala Pro Gin Gin Glu Ser Ser Pro Leu Arg Glu Asn Val 
50 55 60 

GCC TTA CAG CTG GGA CTG AGC CCT TCC AAG ACC TTT TCC AGG CGG AAC 658 
Ala Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn 
65 70 75 



CAA AAC TGT GCC GCA GAG ATC CCT CAA GTG GTT GAA ATC AGC ATC GAG 
Gin Asn Cys Ala Ala Glu lie Pro Gin Val Val Glu He Ser He Glu 
80 85 90 95 



706 



AAA GAC AGT GAC TCG GGT GCC ACC CCA GGA ACG AGG CTT GCA CGG AGA 754 
Lys Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg 
100 105 110 

GAC TCC TAC TCG CGG CAC GCC CCG TGG GGA GGA AAG AAG AAA CAT TCC 802 
Asp Ser Tyr Ser Arg His Ala Pro Trp Gly Gly Lys Lys Lys His Ser 
115 120 125 

TGT TCC ACA AAG ACC CAG AGT TCA TTG GAT ACC GAG AAA AAG TTT GGT 850 
Cys Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly 
130 135 140 

AGA ACT CGA AGC GGC CTT CAG AGG CGA GAG CGG CGC TAT GGA GTC AGC 898 
Arg Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser 
145 150 155 

TCC ATG CAG GAC ATG GAC AGC GTT TCT AGC CGC GCG GTC GGG AGC CGC 946 
Ser Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg 
150 165 170 175 
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TCC CTG AGG CAG AGG CTC CAG GAC ACG GTG GGT TTG TGT TTT CCC ATG 994 

Ser Leu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met 
180 185 190 



AGA ACT TAG AGC AAG CAG TCA AAG CCA CTC TTT TCC AAT AAA AGA AAA 1042 
Arg Thr Tyr Ser Lys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys 
195 200 205 



ATA CAT CTT TCT GAA TTA ATG CTG GAG AAA TGC CCT TTT CCT GCT GGC 1090 
lie His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly 
210 215 220 



TCG GAT TTA GCA CAA AAG TGG CAT TTG ATT AAA CAG CAT ACC GCC CCT 1138 
Ser Asp Leu Ala Gin Lys Trp His Leu lie Lys Gin His Thr Ala Pro 
225 230 235 



GTG AGC CCA CAC TCA ACA TTT TTT GAT ACA TTT GAT CCA TCA CTG GTG 1186 
Val Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val 
240 245 250 255 



TCT ACA GAA GAT GAA GAA GAT AGG CTT CGC GAG AGA AGA CGG CTT AGT 12 34 

Ser Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser 
260 265 270 



ATC GAA GAA GGG GTG GAT CCC CCT CCC AAC GCA CAA ATA CAC ACC TTT 12 82 

lie Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin lie His Thr Phe 
275 280 285 



GAA GCT ACT GCA CAG GTC AAC CCA TTG TAT AAG CTG GGA CCA AAG TTA 13 30 

Glu Ala Thr Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu 
290 295 300 



GCT CCT GGG ATG ACA GAG ATA AGT GGA GAT GGT TCT GCA ATT CCA CAA 1378 
Ala Pro Gly Met Thr Glu lie Ser Gly Asp Gly Ser Ala lie Pro Gin 
305 310 315 



GCA ATT GTG ACT CAG AAG AGG ATT CAA CCA CCC TAT GTC TGC AGT CAC 1425 
Ala lie Val Thr Gin Lys Arg lie Gin Pro Pro Tyr Val Cys Ser His 
320 325 330 335 



GGA GGC AGA AGC AGC GCC AGG TGT CCG GGG ACA GCC ACG CGC ACG TTA 1474 
Gly Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Thr Leu 
340 345 350 



GCA GAC AGG GAG CTT GGA AAG TTC ATA CGC AGA TCG ATT ACA TAC ACT 1522 
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Ala Asp Arg Glu Leu Gly Lys Phe He Arg Arg Ser He Thr Tyr Thr 
355 360 365 

GCC TCG TGC CAG ATT TGC TTC AGA TCA CAG GGA ATC CCT GTT ACT GGG 1570 
Ala Ser Cys Gin He Cys Phe Arg Ser Gin Gly He Pro Val Thr Gly 
370 375 380 

GCG TGA TGG ACC GAT ACG AGG CCG AAG CCC TTC TAG AAG GGA AAC CGG 1618 
Ala * Trp Thr Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg 
385 390 395 

AAG GCA CGT TCT TGC TCA GGG ACT CTG CAC AGG AGG ACT ACC TCT TCT 1666 
Lys Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser 
400 405 410 415 

CTG TGA GCT TCC GCC GCT ACA ACA GGT CTC TGC ACG CCC GGA TCG AGC 1714 
Leu * Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser 
420 425 430 

AGT GGA ACC ACA ACT TCA GCT TCG ATG CCC ATG ACC CCT GCG TGT TTC 1762 
Ser Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe 
435 440 445 

ACT CCT CCA CGT CAC GGG GCT TCT CGA ACA CTA TAA AGA CCC CAG CTC 1810 
Thr Pro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu 
450 455 460 

TTG CAT GTT TTT TGA ACC GTT GCT AAC GAT ATC ACT GAA TAG AAC TTT 1858 
Leu His Val Phe * Thr Val Ala Asn Asp He Thr Glu * Asn Phe 
465 470 475 

CCC TTT CAG CCT GCA GTA TAT CTG CCG CGC AGT GAT CTG CAG ATG CAC 1906 
Pro Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His 
480 485 490 495 

TAG GTA TGA TGG GAT TGA CGG GCT CCC GCT ACC GTC GAT GTT ACA GGA 19 5 4 

Tyr Val * Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly 
500 505 510 

TTT TTT AAA AGA GTA TCA TTA TAA ACA AAA AGT TAG GGT TCG CTG GTT 2002 
Phe Phe Lys Arg Val Ser Leu * Thr Lys Ser * Gly Ser Leu Val 
515 520 525 

AGA ACG AGA CCA GTC AAA GCA AAG TAACTCCTGT CCCCAAAGGG CACTAACTAA 2 056 

Arg Thr Arg Pro Val Lys Ala Lys 
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530 535 

GTCTGCTCCT CCCGTGCATC GAACTGCACC CATAGGAGGC AGTCAGCTGC TAGGATTTCC 2116 

CACCCAGAAT GGGAGCTTAG TCATTAGCCT CTGCCCTATG GGGTCCGCTG TTCCTCAGAC 217 6 

AAAGGTGCCT AGGGACAGCA AGATGGCTTG CAGGTGTTCG GTGGGCTGTG ACAACTGAGG 223 6 

GAGGCAACTC TGGGGCATTT GCTATGAAGA ATTCTATTTC TTACCGAAGA ACAAATTATT 2296 

AATATTGGAT GGGTATTTCA ATAGTGTGAC TAATGTTTGA AATTATTTTT TCTAAGAATT 2356 

TTTCTATAAC CTTCAGAAAA AGTAGTGATG TTTGTAGTTA CTATAAATCA AGCTTTGAAA 2416 

GTTCAAAACA AACAAGTTAA ATAAAAGACT ACCTTCCTTT TAGAGAAAAC AAATGCAAGT 247 6 

TTTCCCAGCC ACAGGCATTG TGCACTGTTA ATGTTGCTTG TTATCAGCTC CTTTCTCCTC 2536 

C 2537 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 535 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Asp Lys Val Gly Lys Met Trp Asn Asn Leu Lys Tyr Arg Cys Gin 
15 10 15 

Asn Leu Phe Ser His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu Met 
20 25 30 

Asn Pro Asn Arg Cys Pro Ser Val Lys Glu Lys Ser lie Ser Leu Gly 
35 40 45 

Glu Ala Ala Pro Gin Gin Glu Ser Ser Pro Leu Arg Glu Asn Val Ala 
50 55 60 

Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn Gin 
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SS 70 75 80 

Asn Cys Ala Ala Glu lie Pro Gin Val Val Glu lie Ser He Glu Lys 
85 90 95 

Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg Asp 
100 105 110 

Ser Tyr Ser Arg His Ala Pro Trp Gly Gly Lys Lys Lys His Ser Cys 
115 120 125 

Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly Arg 
130 135 140 

Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser Ser 
145 150 155 160 

Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg Ser 
165 170 175 

Leu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met Arg 
180 185 190 

Thr Tyr Ser Lys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys He 
195 200 205 

His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly Ser 
210 215 220 

Asp Leu Ala Gin Lys Trp His Leu He Lys Gin His Thr Ala Pro Val 
225 230 235 240 

Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val Ser 
245 250 255 

Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser He 
260 265 270 

Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin He His Thr Phe Glu 
275 280 285 

Ala Thr Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu Ala 
290 295 300 



Pro Gly Met Thr Glu He Ser Gly Asp Gly Ser Ala He Pro Gin Ala 
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305 310 315 320 

lie Val Thr Gin Lys Arg lie Gin Pro Pro Tyr Val Cys Ser His Gly 
325 330 335 

Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Thr Leu Ala 
340 345 350 

Asp Arg Glu Leu Gly Lys Phe lie Arg Arg Ser lie Thr Tyr Thr Ala 
355 360 365 

Ser Cys Gin lie Cys Phe Arg Ser Gin Gly lie Pro Val Thr Gly Ala 
370 375 380 

* Trp Thr Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg Lys 
385 390 395 400 

Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser Leu 
405 410 415 

* Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser Ser 

420 425 430 

Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe Thr 
435 440 445 

Pro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu Leu 
450 455 460 

His Val Phe * Thr Val Ala Asn Asp lie Thr Glu * Asn Phe Pro 
465 470 475 480 

Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His Tyr 
485 490 495 

Val ♦ Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly Phe 
500 505 510 

Phe Lys Arg Val Ser Leu * Thr Lys Ser * Gly Ser Leu Val Arg 
515 520 525 

Thr Arg Pro Val Lys Ala Lys 
530 535 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1221 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GATTAAACAG CATACAGCTC CTGTGAGCCC ACATTCAACA TTTTTTGATA CTTTGATCCA 6 0 

TCTTTGGTTT CTACAGAAGA TGAAGAAGAT AGGCTTAGAG AGAGAAGGCG GCTTAGTATT 12 0 

GAAGAAGGGG TTGATCCCCC TCCCAATGCA CAAATACATA CATTTGAAGC TACTGCACAG 180 

GTTAATCCAT TATTAAACTG GGACCAAAAT TAGCTCCTGG AATGACTGAA ATAAGTGGGG 2 40 

ACAGTTCTGC AATTCCACAA GCTAATTGTG ACTCGGAAGA GGATACAACC ACCCTGTGTT 300 

GCAGTCACGG AGGCAGAAGC AGCGTCAGAT ATCTGGAGAC AGCCATACCC ATGTTAGCAG 3 60 

ACAGGGAGCT TGGAAAGTCC ACACACAGAT TGATTACATA CACTGCTTCG TGCCTGATTT 42 0 

GCTTCAAATT ACAGGGAATC CCTGTTACTG GGGAGTGATG GACCGTTATG AAGCAGAAGC 480 

CCTTCTCGAA GGGAAACCTG AAGGCACGTT TTTGCTCAGG GACTCTGCGC AAGAGGACTA 54 0 

CTTCTTCTCT GTGAGCTTCC GCCGATACAA CAGATCCCTG CATGCCCGAA TTGAGCAGTG 600 

GAATCACAAC TTTAGTTTCG ACGCCCATGA CCCGTGTGTA TTTCACTCCT CCACTGTAAC 660 

GGGACTTTTA GAACATTATA AAGATCCCAG TTCGTGCATG TTTTTTGAAC CATTGCTTAC 72 0 

TAT ATC ACTA AATAGGACTT TCCCTTTTAG CCTGCAGTAT ATCTGTCGCG CGGTAATCTG 780 

CAGGTGCACT ACGTATGATG GAATTGATGG GCTCCCTCTA CCCTCAATGT TACAGGATTT 840 

TTTAAAAGAG TATCATTATA AACAAAAAGT TAGAGTTCGC TGGTTGGAAC GAGAACCAGT 900 

CAAGGCAAAG TAAACTCTCC GGTCCCCAAA GGGTGTTAAC TAGGTCCGCT TTCATGTGCA 960 
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TCAGACAGTA CACCTATAGC AAGCACACGT AGCAGTGTTA GGCTTTTTCA TACAGTATGT 102 0 

AAGCTTAGTG TTAGTATCTG TCAGATGCTA CCTGCTGTTA CTTATTCAGA TAAACATGGT 1080 

GCCTATTGGA ACAATAGCGG ATAGAGCTAC AGGTGTTCAG TAAGACTACA AAAACATTTT 114 0 

GCCTATTTCG CTAACAGTTT GGTTTTTAAT GGCTGTGGTA TTTGAGTGAG GCAACTCTGG 12 00 

GGCATTTGTT ATGAAGATiAT G 1221 



(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 69 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 116.. 1330 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



GGCACGAGGC GGTGGTGGCG GCGGCGGGCG CGGCCGCGGC GGGGCGGGCG CGGAATGAAG 60 

GCCCACGGCC CTGGGGGCTG AGGCGCCCGC CGCCTGGGGC GGGCCGCGCG TCCTC ATG 118 

Met 
1 

GAG GCC GGA GAG GAG CCG CTG CTG CTG OCT GAA CTC AAG CCT GGG CGC 166 
Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly Arg 
5 10 15 

CCC CAC CAG TTC GAC TGG AAG TCA AGC TGC GAG ACC TGG AGC GTG GCC 214 
Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val Ala 
20 25 30 

TTC TCG CCA GAC GGT TCC TGG TTC GCC TGG TCT CAA GGA CAC TGC GTG 2 62 
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Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys Val 
35 40 45 

GTC AAG CTG GTC CCC TGG CCC TTA GAG GAA CAG TTC ATC OCT AAA GGA 310 
Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe lie Pro Lys Gly 
50 55 60 65 

TTC GAA GCC AAG AGC CGA AGC AGO AAG AAT GAC CCA AAA GGA CGG GGC 358 
Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg Gly 
70 75 80 

AGT CTG AAG GAG AAG ACG CTG GAC TGT GGC CAG ATT GTG TGG GGG CTG 406 
Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin lie Val Trp Gly Leu 
85 90 95 

GCC TTC AGC CCG TGG CCC TCT CCA CCC AGC AGG AAA CTC TGG GCA CGT 454 
Ala Phe ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala Arg 
100 105 110 

CAC CAT CCC CAG GCG CCT GAT GTT TCT TGC CTG ATC CTG GCC ACA GGT 502 
His His Pro Gin Ala Pro Asp Val Ser Cys Leu lie Leu Ala Thr Gly 
115 120 125 

CTC AAC GAT GGG CAG ATC AAG ATT TGG GAG GTA CAG ACA GGC CTC CTG 55 0 

Leu Asn Asp Gly Gin lie Lys lie Trp Glu Val Gin Thr Gly Leu Leu 
130 135 140 145 

CTT CTG AAT CTT TCT GGC CAC CAA GAC GTC GTG AGA GAT CTG AGC TTC 59 8 

Leu Leu Asn Leu Ser Gly His Gin Asp Val Val Arg Asp Leu Ser Phe 
150 155 160 

ACG CCC AGC GGC AGT TTG ATT TTG GTC TCT GCA TCC CGG GAT AAG ACA 64 6 

Thr Pro Ser Gly Ser Leu He Leu Val Ser Ala Ser Arg Asp Lys Thr 
165 170 175 

CTT CGA ATT TGG GAC CTG AAT AAA CAC GGT AAG CAG ATC CAG GTG TTA 694 
Leu Arg He Trp Asp Leu Asn Lys His Gly Lys Gin He Gin Val Leu 
180 185 190 

TCC GGC CAT CTG CAG TGG GTT TAC TGC TGC TCC ATC TCC CCT GAC TGT 742 
Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser He Ser Pro Asp Cys 
195 200 205 



AGC ATG CTG TGC TCT GCA GCT GGG GAG AAG TCG GTC TTT CTG TGG AGC 
Ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp Ser 
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210 215 220 225 

ATG CGG TCC TAG ACA CTA ATC CGG AAA CTA GAA GGC CAC CAA AGC AGT 83 8 

Met Arg Ser Tyr Thr Leu lie Arg Lys Leu Glu Gly His Gin Ser Ser 

230 235 240 

GTT GTC TCC TGT GAT TTC TOT OCT GAT TCA GCC TTG CTT GTC ACA GCT 886 

Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu Val Thr Ala 

245 250 255 

TOG TAT GAC AGO AGT GTG ATT ATG TGG GAC CCC TAG ACC GGC GCG AGG 934 

Ser Tyr Asp Thr Ser Val lie Met Trp Asp Pro Tyr Thr Gly Ala Arg 

260 265 270 

CTG AGG TCA CTT CAT CAC ACA CAA CTT GAA CCC ACC ATG GAT GAC AGT 9 82 

Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Met Asp Asp Ser 

275 280 285 

GAC GTC CAC ATG AGC TCC CTG AGG TCC GTG TGC TTC TCA CCT GAA GGC 103 0 

Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu Gly 
290 295 300 305 

TTG TAT GTC GCT ACG GTG GCA GAT GAC AGG CTG CTC AGG ATC TGG GCT 1078 

Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg lie Trp Ala 

310 315 320 

CTG GAA CTG AAG GCT CCG GTT GCC TTT GCT CCG ATG ACC AAT GGT CTT 112 6 

Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly Leu 

325 330 335 

TGC TGC ACG TTC TTC CCA CAC GGT GGA ATT ATT GCC ACA GGG ACG AGA 1174 

Cys Cys Thr Phe Phe Pro His Gly Gly lie lie Ala Thr Gly Thr Arg 

340 345 350 

GAT GGC CAT GTC CAG TTC TGG ACA GCT CCC CGG GTC CTG TCC TCA CTG 1222 

Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val Leu Ser Ser Leu 

355 360 365 

AAG CAC TTA TGC AGG AAA GCC CTC CGA AGT TTC CTG ACA ACG TAT CAA 127 0 

Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr Gin 
370 375 380 385 

GTC CTA GCA CTG CCA ATC CCC AAG AAG ATG AAA GAG TTC CTC ACA TAC 1318 

Val Leu Ala Leu Pro lie Pro Lys Lys Met Lys Glu Phe Leu Thr Tyr 

390 395 400 
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AGG ACT TTC TAGCAGTGCC GGCTCCCCCA CCTCCTGCAG CAGCAGCAGT 1367 
Arg Thr Phe 

405 

ACAAGGGACT GGCTAGGATG GAGTCAGGCA GCTCACACTG GACCAGTGTG GACCTTCCTT 142 7 

CCTCCCATGG CATGTGCAAG TAGGTCTGCG TGACCCCACT TCTGTGGTGC CGGCCTTACC 1487 

TCGTCTTCAT CCGTGGTGAG CAGCCTTCGT CAGTCTAGTT GTGTTGAAGC CAAGTGCAGT 154 7 

TGTGGATGTT GCTGGGGTAA TAAAGGCAAG CGGGCTCCAG AGCCTCTCTG GTGGCGGCCA 1607 

AGCCACACTC CCTTAACTGG GAAGTACCTG CCACGTAGGG CATTTCTGCT GCCTATTTCC 1667 

AGCCAGCGGC TGCATGGTTT GAAGTTCCTC CGTTGTGGTC AGAAGAACTC TGGTGTTTGG 1727 

TTCCCTGCTC AGCTGCGCGT GGACTGGGCT GAGCTCCTCA CCATACACTA GTGCCGGCTT 17 87 

TTGTTTCCTG TAAACAGTGG TTGCATGTGT AGAGAAGTAA CAAGCGAGTA TTCAGATCAT 1847 

ACGAGGAGGC GTTCCTCGGT GCATGACGGT CAGATGGCCA TTTATCAGCA TATTTATTTG 1907 

TATTTTCTCA GCACATAGTA AGGTACAACT GTGTTTTCTC AATTGTCTCG AAAAAACAGA 19 67 

GTTCTTAAGT GGCCCAGTTG TGGAGCCAAG TCTAAGTCGT GTGGAGTCAG TGCTGACATC 2 027 

ACTGGCTTGT GCTGTCTGTC ACATGTGTTT GTCTCTGCTG CTTGACCTCA TGGGATGTAC 2 0 87 

CCTCCAGTTC AACTGCCCAA AACAGACAGC CCCTTCCAAG CACCGTTCTT TGACAGCGGT 2147 

AGCAGCTACC TATTCAAGAC GCCTCACACA AAATCTGCCT TAGAAAGTTA ATATATTTTA 2207 

AATTATTTTA AAAGAAACTC AACATCTTAT TCTTTGGCCT TTCTTAATTG ATGCTTTATG 2 2 67 

GAGGCAGTGT TAACATTGTA CAGTGTATGC ATAGAGGAGT CTCCTCTATT TGAAGAACAA 23 27 

TGCAAAATGA GGCTTTCATT GAAGGGAAAA AAAAAAAAAA AA 23 69 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 404 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Met Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly 
15 10 15 

Arg Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val 
20 25 30 

Ala Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys 
35 40 45 

Val Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe lie Pro Lys 
50 55 60 

Gly Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg 
55 70 75 80 

Gly Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin lie Val Trp Gly 
85 90 95 

Leu Ala Phe Ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala 
100 105 110 

Arg His His Pro Gin Ala Pro Asp Val Ser Cys Leu lie Leu Ala Thr 
115 120 125 

Gly Leu Asn Asp Gly Gin lie Lys lie Trp Glu Val Gin Thr Gly Leu 
130 135 140 

Leu Leu Leu Asn Leu Ser Gly His Gin Asp Val Val Arg Asp Leu Ser 
145 150 155 160 

Phe Thr Pro Ser Gly Ser Leu lie Leu Val Ser Ala Ser Arg Asp Lys 
165 170 175 

Thr Leu Arg lie Trp Asp Leu Asn Lys His Gly Lys Gin lie Gin Val 
180 185 190 

Leu Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser lie Ser Pro Asp 
195 200 205 

Cys Ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp 
210 215 220 
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Ser Met Arg Ser Tyr Thr Leu lie Arg Lys Leu Glu Gly His Gin Ser 
225 230 235 240 

Ser Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu Val Thr 
245 250 255 

Ala Ser Tyr Asp Thr Ser Val lie Met Trp Asp Pro Tyr Thr Gly Ala 
260 265 270 

Arg Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Met Asp Asp 
275 280 285 

Ser Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu 
290 295 300 

Gly Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg He Trp 
305 310 315 320 

Ala Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly 
325 330 335 

Leu Cys Cys Thr Phe Phe Pro His Gly Gly He He Ala Thr Gly Thr 
340 345 350 

Arg Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val Leu Ser Ser 
355 360 365 

Leu Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr 
370 375 380 

Gin Val Leu Ala Leu Pro He Pro Lys Lys Met Lys Glu Phe Leu Thr 
385 390 395 400 

Tyr Arg Thr Phe 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1246 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

GACACTGCAT CGTCAAACTG ATCCCCTGGC CGTTGGAGGA GCAGTTCATC CCTAAAGGGT 60 

TTGAAGCCAA AAGCCGAAGT AGCAAAAATG AGACGAAAGG GCGGGGCAGC CCAAAAGAGA 12 0 

AGACGCTGGA CTGTGGTCAG ATTGTCTGGG GGCTGGCCTT CAGCCTGTGC TTTCCCCACC 160 

CAGCAGGAAG CTCTGGGCAC GCCACCACCC CCAAGTGCCC GATGTCTCTT GCCTGGTTCT 240 

TGCTACGGGA CTCAACGATG GGCAGATCAA GATCTGGGAG GTGCAGACAG GGCTCCTGCT 300 

TTTGAATCTT TCCGGCCACC AAGATGTCGT GAGAGATCTG AGCTTCACAC CCAGTGGCAG 360 

TTTGATTTTG GTCTCCGCGT CACGGGATAA GACTCTTCGC ATCTGGGACC TGAATAAACA 42 0 

CGGTAAACAG ATTCAAGTGT TATCGGGCCA CCTGCAGTGG GTTTACTGCT GTTCCATCTC 480 

CCCAGACTGC AGCATGCTGT GCTCTGCAGC TGGAGAGAAG TCGGTCTTTC TATGGAGCAT 540 

GAGGTCCTAC ACGTTAATTC GGAAGCTAGA GGGCCATCAA AGCAGTGTTG TCTCTTGTGA 600 

CTTCTCCCCC GACTCTGCCC TGCTTGTCAC GGCTTCTTAC GATACCAATG TGATTATGTG 660 

GGACCCCTAC ACCGGCGAAA GGCTGAGGTC ACTCCACCAC ACCCAGGTTG ACCCCGCCAT 720 

GGATGACAGT GACGTCCACA TTAGCTCACT GAGATCTGTG TGCTTCTCTC CAGAAGGCTT 780 

GTACCTTGCC ACGGTGGCAG ATGACAGACT CCTCAGGATC TGGGCCCTGG AACTGAAAAC 840 

TCCCATTGCA TTTGCTCCTA TGACCAATGG GCTTTGCTGG CACATTTTTT CCACATGGTG 900 

GAGTCATTGC CACAGGGACA AGAGATGGCC ACGTCCAGTT CTGGACAGCT CCTAGGGTCC 960 

TGTCCTCACT GAAGCACTTA TGCCGGAAAG CCCTTCGAAG TTTCCTAACA ACTTACCAAG 102 0 

TCCTAGCACT GCCAATCCCC AAGAAAATGA AAGAGTTCCT CACATACAGG ACTTTTTAAG 1080 

CAACACCACA TCTTGTGCTT CTTTGTAGCA GGGTAAATCG TCCTGTCAAA GGGAGTTGCT 1140 

GGAATAATGG GCCAAACATC TGGTCTTGCA TTGAAATAGC ATTTCTTTGG GATTGTGAAT 1200 
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AGAATGTAGC AAAACCAGAT TCCAGTGTAC TAGTCATGGA TTTTTC 124 6 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

ACCATGGTTC CAAGTCCTCT CCCCTGTGGT CAAGTTGCCC GAATGTTGGG CCCAAGTGCC 60 

TTTTCCTCCT TGGGCCTCCC CTTCTGACCT GCAGGACAGT TTTCCGGAGC CCATTTGGTA 120 

TGAGGTATTA ATTAGCCTTA ACTAAATTAC AGGGGACTCA GAGGCCGTGC TCCTGACCGA 180 

TCCAGACACT ATTTTTTTTT TTTTTTTTTA ACAATGGTGT GCATGTGCAG GAAATGACAA 240 

ATTTGTATGT CAGATTATAC AAGGATGTAT TCTTAAACCG CATGACTATT CAGATGGCTA 3 00 

CTGAGTTATC AGTGGCCATT TATTAGCATC ATATTTATTT GTATTTTCTC AACAGATGTT 3 60 

AAGGTACAAC TGTGTTTTTC TCGATTATCT AAAAACCATA GTACTTAAAT TGAAAAAAAA 42 0 



AA 



422 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2019 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



( i i ) MOLECULE TYPE : DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

GGCACGAGGC GGGGTCAGGG CGGAGGCTGA GGACCAAGTA GGCATGGCGG AGGGCGGGAC 60 

CGGCCCCGAT GGACGGGCCG GCCCGGGACC CGCAGGTCCT AATCTGAAGG AGTGGCTGAG 12 0 

GGAGCAGTTC TGTGACCATC CACTGGAGCA CTGTGACGAT ACAAGACTCC ATGATGCAGC 180 

CTATGTAGGG GACCTCCAGA CCCTCAGGAA CCTACTGCAA GAGGAGAGCT ACCGGAGCCG 240 

CATCAATGAG AAGTCTGTCT GGTGCTGCGG CTGGCTTCCC TGCACACCAC TGAGGATCGC 300 

AGCCACTGCA GGCCATGGGA ACTGTGTGGA CTTCCTCATA CGCAAAGGGG CCGAGGTGGA 360 

CCTGGTGGAT GTCAAGGGGC AGACTGCCCT GTATGTGGCT GTAGTGAACG GGCACTTGGA 42 0 

GAGCACTGAG ATCCTTTTGG AAGCTGGTGC TGATCCCAAC GGCAGCCGGC ACCACCGCAG 480 

CACTCCTGTG TACCATGCCT YTCGTGTGGG TAGGGACGAC ATCCTGAAGG CTCTTATCAG 54 0 

GTATGGGGCA GATGTTGATG TCAACCATCA TCTGAATTCT GACACCCGGC CCCCTTTTTC 600 

ACGGCGGCTA ACCTCCTTGG TGGTCTGTCC TCTATACATC AGTGCTGCCT ACCATAACCT 660 

TCAGTGCTTC AGGCTGCTCT TGCAGGCTGG GGCAAATCCT GACTTCAATT GCAATGGCCC 72 0 

TGTCAACACC CAGGAGTTCT ACAGGGGATC CCCTGGGTGT GTCATGGATG CTGTCCTGCG 780 

CCATGGCTGT GAAGCAGCCT TCGTGAGTCT GTTGGTAGAG TTTGGAGCCA ACCTGAACCT 840 

GGTGAAGTGG GAATCCCTGG GCCCAGAGGC AAGAGGCAGA AGAAAGATGG ATCCTGAGGC 900 

CTTGCAGGTC TTTAAAGAGG CCAGAAGTAT TCCCAGGACC TTGCTGAGTT TGTGCCGGGT 960 

GGCTGTGAGA AGAGCTCTTG GCAAATACCG ACTGCATCTG GTTCCCTCGC TGCCGCTGCC 102 0 

AGACCCCATA AAGAAGTTTT TGCTTTATGA GTAGCATTCA CATGCAGTGC TGACTGCAAT 1080 

GTGGAAGCCG ATCACCTGCA GTGAAAACTG ACACAGACTC TGGCATCCTG GGAACCATGG 1140 

CCTGTGCTGC CAGCTTGATC CTTGGCTGTC AGTGAAGAAA AAACGGCTGT GTTCTCTTGG 12 0 0 

ACTGTGATTC TATCTCAGGT GCTTGGGCCA TCGAACGCTC CTTGAGTCAT TGTCAACTGA 12 60 
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GAGGCACATA CAAACTTAAT TTTGTTCCTC TTCAGTCTCT CTGTTTTGGA TTCTTCCTGG 132 0 

CAATGTGTGC AGCATGGGCT GAGCCTGGTG ATTGCCCTAG TGGGGAAGGC TTTTTTCTCC 13 80 

AGGCTATGCA TCTATTTATG TTCCTACTTT GCAATTTATT GTTCTTTTAA GGCTTGATAT 1440 

CAAAACAGAA AGAGGTTTGT TAAGAAAAGA TATAGGGAGA AAGGAATTCC GGTTCCGTGC 1500 

ACTTGCTAGC CTGCTTTCCT TGCCTGGGTT TGTCTGTCTA TGCTGCCTGG TGCACATCCC 1560 

TTCTCTTTGC TGCCACTGTT CTATTTTGGG AGTTGTCTTC CGTCTAAGAT GGCTTCTGGG 1520 

GTTCTATCTT ATTGCACAGA GGTCCCAGAA CAGTGTTCAT AGGGCACCAT CTGCTCTGCC 1680 

AAGGGTTTTC TGATGTCTTA CCCTGGGGAT CTTCAGACAG TGGTTACCTT TAGGAGACCC 1740 

ACCTGGAACT AACCATTAAG TGACTGCCCA CATTCAGATC AGGGACCATC TTAATAGTAC 1800 

TCACTGCCAG TCCTCACAAG AGAAGATGAC ACGGGTGCTC TCTTCAGACA CTCCCATACA 1860 

GGAAGTTGGA AAATGTCTTG GTCACCTGGG TTGTTCCCAG GCTACAACTT CTTGGTGTTC 1920 

CACTAARACC AGRATATCCT AGTTTTTTGG GTTGACTGTT CCCTCCCCAC TTTCCTTGAA 1980 

NCCCAATGCC CNTTTGTKTM GGTTGCTTCC CTAAAAKTT 2 019 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Ala Arg Gly Gly Val Arg Ala Glu Ala Glu Asp Gin Val Gly Met Ala 
15 10 15 
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Glu Gly Gly Thr Gly Pro Asp Gly Arg Ala Gly Pro Gly Pro Ala Gly 
20 25 30 

Pro Asn Leu Lys Glu Trp Leu Arg Glu Gin Phe Cys Asp His Pro Leu 
35 40 45 

Glu His Cys Asp Asp Thr Arg Leu His Asp Ala Ala Tyr Val Gly Asp 
50 55 60 

Leu Gin Thr Leu Arg Asn Leu Leu Gin Glu Glu Ser Tyr Arg Ser Arg 
65 70 75 80 

lie Asn Glu Lys Ser Val Trp Cys Cys Gly Trp Leu Pro Cys Thr Pro 
85 90 95 

Leu Arg lie Ala Ala Thr Ala Gly His Gly Asn Cys Val Asp Phe Leu 
100 105 110 

lie Arg Lys Gly Ala Glu Val Asp Leu Val Asp Val Lys Gly Gin Thr 
115 120 125 

Ala Leu Tyr Val Ala Val Val Asn Gly His Leu Glu Ser Thr Glu lie 
130 135 140 

Leu Leu Glu Ala Gly Ala Asp Pro Asn Gly Ser Arg His His Arg Ser 
145 150 155 160 

Thr Pro Val Tyr His Ala Xaa Arg Val Gly Arg Asp Asp lie Leu Lys 
165 170 175 

Ala Leu lie Arg Tyr Gly Ala Asp Val Asp Val Asn His His Leu Asn 
180 185 190 

Ser Asp Thr Arg Pro Pro Phe Ser Arg Arg Leu Thr Ser Leu Val Val 
195 200 205 

Cys Pro Leu Tyr lie Ser Ala Ala Tyr His Asn Leu Gin Cys Phe Arg 
210 215 220 

Leu Leu Leu Gin Ala Gly Ala Asn Pro Asp Phe Asn Cys Asn Gly Pro 
225 230 235 240 

Val Asn Thr Gin Glu Phe Tyr Arg Gly Ser Pro Gly Cys Val Met Asp 
245 250 255 



SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



- 146 



Ala Val Leu Arg His Gly Cys Glu Ala Ala Phe Val Ser Leu Leu Val 
260 265 270 

Glu Phe Gly Ala Asn Leu Asn Leu Val Lys Trp Glu Ser Leu Gly Pro 
275 280 285 

Glu Ala Arg Gly Arg Arg Lys Met Asp Pro Glu Ala Leu Gin Val Phe 
290 295 300 

Lys Glu Ala Arg Ser He Pro Arg Thr Leu Leu Ser Leu Cys Arg Val 
305 310 315 320 

Ala Val Arg Arg Ala Leu Gly Lys Tyr Arg Leu His Leu Val Pro Ser 
325 330 335 

Leu Pro Leu Pro Asp Pro He Lys Lys Phe Leu Leu Tyr Glu 
340 345 350 



(2) INFORMATION FOR SEQ ID NO : 2 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 419 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

GCATCCATGG CGGAGGGCGG CAGCACGACG GGCGGGCAGG GCCGGGCTCC GCAGGTCGTA 60 

ATCTGAAGGA GTGGCTGAGG GAGCAATTTT GTGATCATCC GCTGGAGCAC TGTGAGGACA 120 

CGAGGCTCCA TGATGCAGCT TACGTCGGGG ACCTCCAGAC CCTCAGGAGC CTATTGCAAG 180 

AGGAGAGCTA CCGGAGCCGC ATCAACGAGA AGTCTGTCTG GTGCTGTGGC TGGCTCCCCT 2 40 

GCACACCGTT GCGAATCGCG GCCACTGCAG GCCATGGGAG CTGTGTGGAC TTCCTCATCC 3 00 

GGAAGGGGGC CGAGGTGGAT CTGGTGGACG TAAAAGGACA GACGGCCCTG TATGTGGCTG 3 60 
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TGGTGAACGG GCACCTAGAG AGTACCCAGA TCCTTCTCGA AGCTGGCGCG GACCCCAAC 419 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 595 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

GAGGAAGAAG AAAAGTGGAC CCTGAGGCCT TGCAGGTCTT TAAAGAGGCC AGAAGTGTTC 60 

CCAGAACCTT GCTGTGTCTG TGCCGTGTGG CTGTGAGAAG AGCTCTTGGC AAAACCGGCT 120 

TCATCTGATT CCTTCGCTGC CTCTGCCAGA CCCCATAAAG AAGTTTCTAC TCCATGAGTA 180 

GACTCCAAGT GCTGCGGTTG ATTCCAGTGA GGGAGAAAGT GATCTGCAGG GAGGTGGACA 240 

CCGAGCCCTG AGTGCTGTGC TGCTGCTGGT CTCCTGATGG CTGTTGCTGC AGAAGATGTC 300 

CTCGTAGACT GTCATTGCTC CTCAGGTGCC TGGGCCGCTG AACAGTCCTT GGGTCATTGT 3 60 

CAGCTGAGAG GCTTATACTA AAGTTATTAT TGTTTTTCCC AAGTTCTCTG TTCTGGATTT 420 

TCAGTTGCAT ATTAATGTAA CGGGCCATGG GGTATGTACA TGTAGGGGCT GAGGTTGGAG 480 

GCCTACTAAT TTCCTGTAGG GAAGACTCCC AGCACTTCTG GAACTGTGCT TCTCTTTATT 540 

TTTCTACTTC TCAATTTGAT GGTTCGATTA AAGCCTTCTA GTATCTCAAT GAAAA 595 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 896 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 4.. 396 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTG ATG TCC GCA ATT CTG AAG GTT GGA CAC CAC TGC TGG CTG CCT GTG 4 8 

Met Ser Ala He Leu Lys Val Gly His His Cys Trp Leu Pro Val 
15 10 15 

ACA TCC GCT GTC AAT CCC CAA AGG ATG CTG AGG CCA CCA CCA ACC GCT 9 6 

Thr Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala 
20 25 30 

GTT TTC AAC TGT GCC GCT TGC TGC TGT CTG TGG GGG CAG ATG CTG ATG 144 
Val Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met 
35 40 45 

AAT ACA TAC CGT GTA GTT CAG CTT CCT GAG GAG GCC AAG GGC TTG GTG 192 
Asn Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val 
50 55 60 

CCA CCA GAG ATT CTA CAG AAG TAC CAT GGA TTC TAC TCT TCC CTC TTT 240 
Pro Pro Glu He Leu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe 
65 70 75 

GCC TTG GTG AGG CAG CCC AGG TCG CTG CAG CAT CTC TGC CGT TGT GCG 2 88 

Ala Leu Val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala 
80 85 90 95 

CTC CGC AGT CAC CTG GAG GGC TGT CTG CCC CAT GCA CTA CCG CGC CTT 336 
Leu Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu 
100 105 110 



CCC CTG CCA CCG CGC ATG CTC CGC TTT CTG CAG CTG GAC TTT GAG GAT 
Pro Leu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp 
115 120 125 



384 



CTG CTC TAC TAGGCTTGCT GCCCTGTGAA CAAAGCAGAC CCCACCCCCA 433 
Leu Leu Tyr 
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CCCCAAGGGC ATCTCTCAGC AATGAATGAT GCAAGGCGGT CTGTCTTCAA GTCAGGAGTG 493 

GACGCCTTGA TCCACACTTG AGAGAAGAGG CCAGATCAGC ACCYGGCTGG TAGTGATNGC 553 

AGAGGGCACC TGTGCAGATC TGTGTGCGCA CTGGAAATCT CTAGGCTGAA GGCYAGAGCA 613 

AATGGTGCAR GTGTTAGTCC TTGGGANGAG AGACAGANGG TGAGAAAGCA AGACAGAGGT 673 

GAGAGTGCAC ATGTCAAGTG GTAGATTGCC TTAAAAGAAA GCTAAAAAAA GAAAAAGATT 73 3 

CGGGCGAACT TCTTTAGGGG TAATGCTGCA GCGTGTTAAA CTGACTGACC AGCGTCCATA 793 

TCTTTGGACC CTTCCCGGGT GAAAAAGCCC CTTCATCCTC CAGCGCTCCC CAAGGGTGCT 853 

TAGCAATACC GGGTGCTTTT CTGCCGCAAA GTGAGTTACC AAA 896 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 130 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met Ser Ala lie Leu Lys Val Gly His His Cys Trp Leu Pro Val Thr 
15 10 15 

Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala Val 
20 25 30 

Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met Asn 
35 40 45 

Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val Pro 
50 55 60 

Pro Glu lie Leu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe Ala 
65 70 75 80 

Leu Val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala Leu 
85 90 95 
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Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu Pro 
100 105 110 

Leu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp Leu 
115 120 125 

Leu Tyr 
130 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GTGGGGGCGT CATCATGACC TCCTCTAGGG CTCTGCAACA TGACTCCTGT GGTGCAAATC 60 

AACAAATTGT TCACTGATGA ATCCACAAGG ATCTCTGGGC CTACAACCAG GTCCTGGTCC 120 

ACATGACTGT CGTCTTCGGA GAAGGCACCA CTCGCCCCCG GCAGGTACGG CTGACACCTC 180 

CATGGGAGAA GACGTATCCA GGCAGCAGCT GCGCGGCCCT TCAAGAGGGC ACATCCCGTC 240 

ATCTAAAGGC ACGGTGTACT GAAGGTAGTC CTGAGACATG AGTCCGATTA CTACAGGCAC 3 00 

GTGTTCCTCC AGGTGGAGGC TCAGGTCCCC GGGTGAGCTG GGGCTGCAGC GGGACTCAGG 3 60 

GCGCGGCTCT GGCTGCAGGT CTCGCAGCTC CCTGGGCTGT AGCTCCCGCA GATCCTTGCG 42 0 

CACACCGTTG ACTGGT 435 
(2) INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2180 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

TTAATAGTAC CTACATAGTA GAAAATTATA ACTCCACTTT AAAACAATGT TTTCTTTCTA 60 

TTCAAATCAA TTTAAAACTT TTTATAAACA TTAATGTTGC AAGAGAATCC AGTCCATTTA 120 

TGAAAATTAG TTGACAATCA AGTTCACCCA AGAAAATGTT GACTAAGCTA AAGAAATCAC 180 

AGATAAAACA TTTTACCAAA AGGATAGGTA ACACACAAAA TiAATGCTATC ACAGGAAGCT 24 0 

ATGATCATCT AATATTTCTT TAATAATAAT TCTAGTTCCA TAGGTTTTCA TGTTATGCCA 300 

ATTTGTACCC GAGTTTAATT ACAGAAAAGG CAACAATTTC TAAATTGGTG GTATACATTT 3 60 

CTTTACAATT TTTTAATGTA AGGCCATTTA TTAAAATAGA CAAACTAGAA GATGAAAACG 42 0 

AAGGCAACAG AAAAATTCAA CTTTTCACAA CCAAAAGAAT TAGCACAACC TTAGAAATAA 480 

TTTAGAAAAA AGTGTTGTTA AAAGATATGT TGCAGATCTC CGTTCCATTA CCCAAGATTA 540 

TGTCAATTCA CGATTCTAAA TAAATCTTTT TAAAGTAAGA GATTAAAAAC TCATCTTCAG 600 

TGTATATGTA AATTCCGTGG TTTTATCACA CAGGTATGTT TATTCAACAC TGCTTTGGAA 660 

ATGGACCATT TAAAAGGACA TGGCAATTTC CATTCTGTTA AGTTTCATTC AACCTTTACT 72 0 

TAGGGGTTGA TTACCACATG AAATGTGCTT TTAATGCATA AAAATCACAG TGGATTAGCC 7 80 

AGCAAAAGGG ACTGGGCGGG GGGGGCATTG AGGAGAATTT GATAATTCAC ATTGTGATTA 840 

TTCTGCACAT TGATGAAACA TAATTCACAC CTCTAAAACC TCAAGACTTC CCTTTTTTAA 900 

AGAACCAAAA TAAACCCAAG ACACCTTGCT GACACTTCCC CACCCCTAAA CAAACTGATG 960 

ACTCTTTTAC ACATAAAACT GAAATAGTTA TGGCAGCAAA AGATTTTGAT GGCAATGAAA 1020 
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GTTTGTAAAC TGTATTTCAA TCTCTTGTTC TTATTCCCAA AGTGCAAGAT GCAGGGTTCT 10 80 

CAATCTTTCA GTAGTGCTTC TCCTGTAAAT AATCCTTCAT TTTGTTTGGC AAAGGCAGTT 114 0 

TCTGAATTAA GTCTATTCTG GTATACTGAC GTATAACAAA ACGACACAGG TACTGCAACG 1200 

AGCGCACCTA TGAACCCCGG AACACTGGTT GGCAAGTTCT GACGGAAGTG CAGATTCCAG 12 60 

GCAGCGAGAC CTTGAATAAC AAAAAGCTCC CATTTTCAGA GTCCCTGATT GAATGCTCCA 1320 

ATTAGATCAA CTATGGACGT ATGTCCTTCC ACATCGGCTG TTCATAAAAG CTAAACCTAC 1380 

CATTTGAGTG CTCAATTCTA GTGTGAAGTG TTTTACCATG GGAGCGAAAG TCACAGCTTA 1440 

AAAGGTAACG GTCGTCAGAA CTGTCCCGAA CAAGAAAAGA ACCATCTGGC ACGTTTGCTA 1500 

GCTTCCCTTC TGCCTCCCAA CGTGTGATTG GTCCCCAGTA CCATCCTTGC TTTGCAAGTT 1560 

TTTTCAGCTC CTCTGTAAGG CTTGTCACAA CCATGGGACC ACTACTTTGC ACTGAGTCAT 1620 

AAACTCTTGC AACCCCAGGA GCAGAGTTCG GATCAAAATT CAAATGACAG CGCATAACTT 16 80 

TCAGCCACGT GGGGCTTTCT GTCCAGTGAG TCCACTGAAA GTTCCCCTTT GGGATTTGGA 1740 

TTATTCCTGC ATTGGAGTAA CCAATGGTGA AGATTGGAGG GACATCCATC GTGAACCCGC 1800 

TCTCCGGGGT TCTGCAACAT GACTCCCGTG GTGCCAATCA ACAAGCCATT CACCGGACTG 1860 

ATCCACGAAG ATCTCTGGGG CGACAACTAG GTCCTGGTCT ACCTGACTCT CATCCTCGGG 192 0 

GAAAGCGCGC CCTCCCACTT GAGGAGGAAC CGCAGAGACT TCCATGGGAG AAGAGCTGTC 1980 

CAGACAATAG CTCCGTGATC CTTCCAAAGG ATACATCCCC TCATCTAAAG GCACAGTATA 204 0 

CTGAATGTAG TCCTGAGGCA TAAGTCCAAT AACGACAGGC ACATGTTCAT CCAGGTGAAG 2100 

ATGCAGGTCT CCATTATGAG AAGCCGAGCT CTTCAGTGAA TTGGCTTGCT CCTGGCACGT 216 0 

GGTCTCAGAC TGGAGGTCGT o^on 

z X o U 



(2) INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2649 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(\i) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

GGCACGAGGC TGTGTCCAGC ACACAGAGAG GGCCCGGCCA TCTGCTTTGG TTCAGAGCCC 60 

TGTGTCTGTC TGTCACTTAG ACTCTTCCTC CCGGCTCGCA GCTCACCCTC CATCCTCCTT 12 0 

ACTGGCTCCA GCATGACTCG CTTCTCTTAT GCAGAGTACT TTGCTCTGTT TCACTCTGGC 180 

TCTGCACCTT CCAGGTCCCC TTCGTCTCCC GAGAACCCAC CGGCCCGCGC ACCCCTGGGT 240 

CTGTTCCAAG GGGTCATGCA GAAGTATAGC AGCAACCTGT TCAAGACCTC CCAGATGGCG 300 

GCTATGGACC CCGTGCTGAA GGCCATCAAG GAAGGGGATG AAGAGGCCTT GAAGATCATG 360 

ATCCAGGATG GGAAGAATCT TGCAGAGCCC AACAAGGAGG GCTGGCTGCC GCTCCACGAG 42 0 

GCTGCCTACT ATGGCCAGCT GGGCTGCCTG AAAGTCCTGC AGCAAGCCTA CCCAGGGACC 480 

ATTGACCAAC GCACACTGCA GGAAGAGACA GCATTATACC TGGCCACATG CAGAGAACAC 540 

CTGGATTGCC TCCTGTCGCT GCTCCAGGCG GGGGCAGAGC CTGACATCTC TAACAAATCC 600 

AGGGAGACTC CACTTTACAA AGCCTGTGAG CGCAAGAACG CGGAGGCGGT GAGGATATTG 660 

GTGCGATACA ACGCAGACGC CAACCACCGC TGTAACAGGG GCTGGACCGC ACTGCACGAG 72 0 

TCTGTCTCCC GCAATGACCT GGAGGTCATG GAGATCCTAG TGAGTGGCGG GGCCAAGGTG 7 80 

GAGGCCAAGA ATGTCTACAG CATCACCCCT TTGTTTGTGG CTGCCCAGAG TGGGCAGCTG 840 

GAGGCCCTGA GGTTCCTGGC CAAGCATGGT GCAGACATCA ACACGCAGGC CAGTGACAGT 900 

GCATCAGCCC TCTACGAGGC CAGCAAGAAT GAGCATGAAG ACGTGGTAGA GTTTCTTCTC 960 

TCTCAGGGCG CCGATGCTAA CAAAGCCAAC AAGGACGGCC TGCTCCCCCT GCATGTTGCC 1020 
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TCCAAGAAGG GCAACTATAG AATAGTGCAG ATGCTGCTGC CTGTGACCAG CCGCACGCGC 1080 

GTGCGCCGTA GCGGCATCAG CCCGCTGCAC CTAGCGGCCG AGCGCAACCA CGACGCGGTG 1140 

CTGGAGGCGC TGCTGGCCGC GCGCTTCGAC GTGAACGCAC CTCTGGCTCC CGAGCGCGCC 1200 

CGCCTCTACG AGGACCGCCG CAGTTCTGCG CTCTACTTCG CTGTGGTCAA CAACAATGTG 12 60 

TACGCCACCG AGCTGTTGCT GCTGGCGGGC GCGGACCCCA ACCGCGATGT CATCAGCCCT 13 2 0 

CTGCTCGTGG CCATCCGCCA CGGCTGCCTG CGCACCATGC AGCTGCTGTT GGACCATGGC 13 80 

GCCAACATCG ACGCCTACAT CGCCACTCAC CCCACCGCCT TTCCAGCCAC CATCATGTTT 1440 

GCCATGAAGT GCCTGTCGTT ACTCAAGTTC CTTATGGACC TCGGCTGCGA TGGCGAGCCC 150 0 

TGCTTCTCCT GCCTGTACGG CAACGGGCCG CACCACCCGC CCCGCGACCT GGCCGCTTCC 1560 

ACGACGCACC CGTGGACGAC AAGGCACCTA GCGTGGTGCA GTTCTGTGAG TTCCTGTCGG 1620 

CCCCGGAAGT GAGCCGCTGG GCGGGACCCA TCATCGATGT CCTCCTGGAC TATGTGGGCA 1680 

ACGTGCAGCT GTGCTCCCGG CTGAAGGAGC ACATCGACAG CTTTGAGGAC TGGGCTGTCA 1740 

TCAAGGAGAA GGCAGAACCT CCGAGACCTC TGGCTCACCT CTGCCGGCTG CGGGTTCGGA 1800 

AGGCCATAGG AAAATACCGG ATAAAACTCC TGGACACACT GCCGCTTCCC GGCAGGCTAA 1860 

TCAGATACTT GAAATATGAG AATACACAGT AACCAGCCTG GAGAGGAGAT GTGGCCTTCA 1920 

GACTGTTTCC GGGACGCCCC AGGTGGCCTG CATCCAGGAC CCCCTGGGGT CAGAACAGGT 1980 

GTGACCTTGC TGGTTCTTTG CTGGAGCTTC ACCCAAAGTG AGAACCTGAT GTGGGGAGTG 2040 

GACGTGGAAC CTCTGCTTTC ACACTGTCAG CGGATCGCAG ACCCGCTCTG CTTCTGGCCA 2100 

TAGCCAGAGA CCTTCAACCT GGGGCCAGGG GAGAGCTGGT CTGGGCAAGG TGGCCCAGGC 2160 

AGGAATCCTG GCCTTAAGCT GGAGAACTTG TAGGAATCCC TCACTGGACC CTCAGCTTTC 222 0 

AGGCTGCGAG GGAGACGCCC AGCCCAAGTA TTTTATTTCC GTGACACAAT AACGTTGTAT 22 80 

CAGAAAAAAA AAAAAACATG GGCGCAGCTT ATTCCTTAGT AGGGTATTTA CTTGCATGCG 234 0 

CGCTTAAAGC TACTGGAAAC ATGCGTTCCA CTATGCTTGA GAATCCCCTT GCACTGGTAA 2400 
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ACGAGAGCCG ACGTGCTTCA AGGTTGGATT TTTGGTTGCC CCTTTGGCGT TCCGCGGGTT 



2460 



TGTCCGACGT AATTGACCCC GTGTTTTGTC ACTTTCGAGT GTTCCGACTA TTGGGGGGCT 



2520 



TTTGGTTGTC CCCAAAATTG TGGGTGGTGT GCGGACGCCA CGAGAAGTGG TTCATGGGCG 



2580 



ATAATCATTA CTGGAGAATG TAGAGCGGCG GTTTTACGAA TAAATATTTT TTAAGCCGCC 



2640 



TTCCCAAAA 



2649 



( 2 ) INFORMATION FOR SEQ ID NO : 3 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

CCTCCTGAGA GTTCGCCGGC CCGGGCCCAA TGGGTTGTTC CAAGGGGTCA TGCAGAAATA 60 

CAGCAGCAGC TTGTTCAAGA CCTCCCAGCT GGCGCCTGCG GACCCCTTGA TAAAGGCCAT 120 

CAAGGATGCG ATGAAGAGGC CTTGAAGACC ATGATCAAGG AAGGGAAGAA TCTCGCAGAG 180 

CCCAACAAGG AGGGCTGGCT GCCGCTGCAC GAGGCCGCAT ACTATGGCCA GGTGGGCTGC 2 40 

CTGAAAGTCC TGCAGCGAGC GTACCCAGGG ACCATCGACC AGCGCACCCT GCAGGAGGAA 3 00 

ACAGCCGTTT ACTTGGCAAC GTGCAGGGGC CACCTGGACT GTCTCCTGTC ACTGCTCCAA 3 60 

GCAGGGGCAG AGCGGGACAT CTCCAACAAA TCCCGAGAGA ACCGCTCTAC AAAGCCTGTG 420 

AGCGCAAGAA CGCGGAAGCC GTGAAGATTC TTGGTGCAGC ACAACGCAGA CACCAACAAC 4 80 

GCTGCAACCG GGCTG 495 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 709 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

GTGCAGCTCT GCTCGCGGCT GAAGGAACAC ATCGACAGCT TTGAGGACTG GGCCGTCATC 60 

AAGGAGAAGG CAGAACCTCC AAGACCTCTG GCTCACCTTT GCCGACTGCG GGTTCGAAAG 120 

GCCATTGGGA AATACCGTAT AAAACTCCTA GACACCTTGC CGCTCCCAGG CAGGCTGATT 180 

AGATACCTGA AATACGAGAA CACCCAGTAA CTGGGGCCAC GGGGAGAGAG GAGTAGCCCC 24 0 

TCAGACTCTT CTTACTAAGT CTCAGGACGT CGGTGTTCCC AACTCCAAGG GGACCTGGTG 30 0 

ACAGACGAGG CTGCAGGCTG CCTCCCTCTC AGCCTGGACA GCTACCAGGA TCTCACTGGG 36 0 

TCTCAGGGCC CAGAGCTTTG GCCAGAGCAG AGAACAGAAT GTGTCAAGGA GAAGAATCAT 42 0 

TTGTTTACAA ACTGATGAGC AGATCCCAGA CCTTCTCTAC CTTCAGGAAT GGCAGAAACC 480 

TCTATTCCTG GGGCCAGGGC AGAGCTTGAG GTGTTCTGGG GAAGGTGGTG CTCAGAGCCT 540 

TCCCTGTGCC CCTCCACTTG TTCTGGAAAA CTCACCACTT GACTTCAGAG CTTTCTCTCC 600 

AAAGACTAAG ATGAAGACGT GGCCCAAGGT AGGGGGTAGG GGGAGCCTGG GTCTTGGAGG 660 

GCTTTGTTAA GTATTAATAT AATAAATGTT ACACATGTGA AAAAAAAAA 709 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 848 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 624 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

TTG GAG AAG TGT GGT TOG TAT TGG GGG CCA ATG AAT TGG GAA GAT GCA 48 

Leu Glu Lys Cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 

15 10 15 

GAG ATG AAG CTG AAA GGG AAA CCA GAT GGT TCT TTC CTG GTA CGA GAC 96 

Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu Val Arg Asp 

20 25 30 

AGT TCT GAT CCT COT TAG ATC CTG AGC CTC AGT TTC CGA TCA CAG GGT 144 

Ser Ser Asp Pro Arg Tyr lie Leu Ser Leu Ser Phe Arg Ser Gin Gly 

35 40 45 

ATC ACC CAC CAC ACT AGA ATG GAG CAC TAC AGA GGA ACC TTC AGC CTG 192 

lie Thr His His Thr Arg Met Glu His Tyr Arg Gly Thr Phe Ser Leu 

50 55 60 

TGG TGT CAT CCC AAG TTT GAG GAC CGC TGT CAA TCT GTT GTA GAG TTT 240 

Trp Cys His Pro Lys Phe Glu Asp Arg Cys Gin Ser Val Val Glu Phe 

65 70 75 80 

ATT AAG AGA GCC ATT ATG CAC TCC AAG AAT GGA AAG TTT CTC TAT TTC 288 

lie Lys Arg Ala lie Met His Ser Lys Asn Gly Lys Phe Leu Tyr Phe 

85 90 95 

TTA AGA TCC AGG GTT CCA GGA CTG CCA CCA ACT CCT GTC CAG CTG CTC 336 
Leu Arg Ser Arg Val Pro Gly Leu Pro Pro Thr Pro Val Gin Leu Leu 

100 105 110 

TAT CCA GTG TCC CGA TTC AGC AAT GTC AAA TCC CTC CAG CAC CTT TGC 3 84 

Tyr Pro Val Ser Arg Phe Ser Asn Val Lys Ser Leu Gin His Leu Cys 

115 120 125 

AGA TTC CGG ATA CGA CAG CTC GTC AGG ATA GAT CAC ATC CCA GAT CTC 432 

Arg Phe Arg lie Arg Gin Leu Val Arg lie Asp His lie Pro Asp Leu 

130 135 140 
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CCA CTG CCT AAA CCT CTG ATC TCT TAT ATC CGA AAG TTC TAG TAG TAT 480 
Pro Leu Pro Lys Pro Leu lie Ser Tyr lie Arg Lys Phe Tyr Tyr Tyr 
145 150 155 160 

GAT CCT CAG GAA GAG GTA TAC CTG TCT CTA AAG GAA GCG CAG CGT CAG 528 
Asp Pro Gin Glu Glu Val Tyr Leu Ser Leu Lys Glu Ala Gin Arg Gin 
165 170 175 

TTT CCA AAC AGA AGC AAG AGG TGG AAC CCT CCA CGT AGC GAG GGG CTC 57 6 

Phe Pro Asn Arg Ser Lys Arg Trp Asn Pro Pro Arg Ser Glu Gly Leu 
180 185 190 

CCT GCT GGT CAC CAC CAA GGG CAT TTG GTT GCC AAG CTC CAG CTT TGAAGAACCA 
631 

Pro Ala Gly His His Gin Gly His Leu Val Ala Lys Leu Gin Leu 
195 200 205 

AATTAAGCTA CCATGAAAAG AAGAGGAAAA GTGAGGGAAC AGGAAGGTTG GGATTCTCTG 691 

TGCAGAGACT TTGGTTCCCC ACGCAAGCCC TGGGGCTTGG AAGAAGCACA TGACCGTACT 751 

CTGCGTGGGG CTCCACCTCA CACCCACCCC TGGGCATCTT AGGACTGGAG GGGCTCCTTG 811 

GAAAACTGGA AGAAGTCTCA ACACTGTTTC TTTTTCA 84 8 

(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 207 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Leu Glu Lys Cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 
15 10 15 

Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu Val Arg Asp 
20 25 30 

Ser Ser Asp Pro Arg Tyr lie Leu Ser Leu Ser Phe Arg Ser Gin Gly 
35 ■ 40 45 
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lie Thr His His 
50 

Trp Cys His Pro 
65 

lie Lys Arg Ala 



Leu Arg Ser Arg 
100 

Tyr Pro Val Ser 
115 

Arg Phe Arg lie 
130 

Pro Leu Pro Lys 
145 

Asp Pro Gin Glu 



Phe Pro Asn Arg 
180 

Pro Ala Gly His 
195 



Thr Arg Met Glu 
55 

Lys Phe Glu Asp 
70 

lie Met His Ser 
85 

Val Pro Gly Leu 



Arg Phe Ser Asn 
120 

Arg Gin Leu Val 
135 

Pro Leu lie Ser 
150 

Glu Val Tyr Leu 
165 

Ser Lys Arg Trp 



His Gin Gly His 
200 



- 159- 

His Tyr Arg Gly 
60 

Arg Cys Gin Ser 
75 

Lys Asn Gly Lys 
90 

Pro Pro Thr Pro 
105 

Val Lys Ser Leu 



Arg lie Asp His 
140 

Tyr lie Arg Lys 
155 

Ser Leu Lys Glu 
170 

Asn Pro Pro Arg 
185 

Leu Val Ala Lys 



Thr Phe Ser Leu 



Val Val Glu Phe 
80 

Phe Leu Tyr Phe 
95 

Val Gin Leu Leu 
110 

Gin His Leu Cys 
125 

lie Pro Asp Leu 



Phe Tyr Tyr Tyr 
160 

Ala Gin Arg Gin 
175 

Ser Glu Gly Leu 
190 

Leu Gin Leu 
205 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 464 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
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GTTCCAAGCC TAACCCATCT TTGTCGTTTG GAAATTCGGG CCAGTCTAAA AGCAGAGCAC 60 

CTTCACTCTG ACATTTTCAT CCATCAGTTG CCACTTCCCA GAAGTCTGCA GAACTATTTG 120 

CTCTATGAAG AGGTTTTAAG AATGAATGAG ATTCTAGAAC CAGCAGCTAA TCAGGATGGA 180 

GAAACCAGCA AGGCCACCTG ACACAGGTCC TTTAATTCTG TTTAGTCACA AAAGACGGCT 240 

TGTGTGACTG TTTGGATTTG GTGATCAAAT GTCCATGTTT ACAGTTGCTT TTCCCAGTTT 3 00 

GTGTCTTTCC CAATATTGTG AACCTTATCC ATCTTGCCTT ACTCAGTTTT ATTTCTAGTG 3 60 

CACTTTGTTG TGTATTATTT GTTTACCTGA CCATTTTCTA CTTTATTCTG CTAATAAACT 420 

GTAATTCTGA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA 464 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 747 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

GGGGATCGAA AGCGGGGGCT TCTGGGACGC AGCTCTGGAG ACGCGGCCTC GGACCAGCCA 60 

TTTCGGTGTA GAAGTGGCAG CACGGCAGAC TGGTCAAACA AATGGATTTT ACAGAGGCTT 12 0 

ACGCGGACAC GTGCTCTACA GTTGGACTTG CTGCCAGGGA AGGCAATGTT AAAGTCTTAA 180 

GGAAACTGCT CAAAAAGGGC CGAAGTGTCG ATGTTGCTGA TAACAGGGGA TGGATGCCAA 240 

TTCATGAAGC AGCTTATCAC AACTCTGTAG AATGTTTGCA AATGTTAATT AATGCAGATT 3 00 

CATCTGAAAA CTACATTAAG ATGAAGACCT TTGAAGGTTT CTGTGCTTTG CATCTCGCTG 360 

CAAGTCAAGG ACATTGGAAA ATCGTACAGA TTCTTTTAGA AGCTGGGGCA GATCCTAATG 42 0 

CAACTACTTT AGAAGAAACG ACACCATTGT TTTTAGCTGT TGAAAATGGA CAGATAGATG 480 
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TGTTAAGGCT GTTGCTTCAA CACGGAGCAA ATGTTAATGG ATCCCATTCT ATGTGTGGAT 



540 



GGAACTCCTT GCACCAGGCT TCTTTTCAGG AAAATGCTGA GATCATAAAA TTGCTTCTTA 



600 



GAAAAGGAGC AAACAAGGAA TGCCAGGATG ACTTTGGAAT CACACCTTTA TTTGTGGCTG 



660 



CTCAGTATGG CCAAGCTAGA AAGCTTTGAA GCATACTTAT TTCATCCGGG TGCAAATGTC 



720 



AATTGTCAAG CCTTGGACAA AGCTACC 



747 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1018 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

CACAAATGGG ACCATACAAA AATCTTGGAC TTGTTAATAA CCACTTACTA ACCGGGACCT 60 

GTGACACTGG GCTAAACAAA GTAAGTCCCT GTTTACTCAG CAGTGTTTGG GGGACATGAA 12 0 

GGATTGCCTA GAAATATTAC TCCGGAATGG TCTACAGCCC AGACGCCCAG GCGTGCCTTG 18 0 

TTTTTGGATT CAGTTCTCCT GTGTGCATGG CTTTCCAAAA GGAGGTGGAG CTGTAGTTCT 24 0 

TTGGAATTGT GAACATTCTT TTGAAATATG GAGCCCAGAT AAATGAACTT CATTTGGCAT 300 

ACTGCCTGAA GTACGAGAAG TTTTCGATAT TTCGCTACTT TTTGAGGAAA GGTTGCTCAT 3 60 

TGGGACCATG GAACCATATA TATGAATTTG TAAATCATGC AATTAAAGCA CAAGCAAAAT 42 0 

ATAAGGAGTG GTTGCCACAT CTTCTGGTTG CTGGATTTGA CCCACTGATT CTACTGTGCA 480 

ATTCTTGGAT TGACTCAGTC AGCATTGACA CCCTTATCTT CACTTTGGAG TTTACTAATT 54 0 

GGAAGACACT TGCACCAGCT GTTGAAAGGA TGCTCTCTGC TCGTGCCTCA AACGCTTGGA 600 

TTCTACAGCA ACATATTGCC CACTGTTCCA TCCCTGACCC ATCTTTGTCG TTTGGAAATT 66 0 
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CGGTCCAGTC TAAAATCAGA ACGTCTACGG TCTGACAGTT ATATTAGTCA GCTGCCACTT 
CCCAGAAGCC TACATAATTA TTTGCTCTAT GAAGACGTTC TGAGGATGTA TGAAGTTCCA 
GAACTGGCAG CTATTCAAGA TGGATAAATC AGTGAAACTA CTTAACACAG CTAATTTTTT 
TCTCTGAAAA ATCATCGAGA CAAAAGAGCC ACAGAGTACA AGTTTTTATG ATTTTATAGT 
CAAAAGATGA TTATTGATTG TCAGATAGGT TAGGTTTTGG GGGGCCAGTA GTTCAGTGAG 
AATGTTTATG TTTACAACTA GCCTTCCCAG TAAAAAAAAA AAAAAAAAAA AAAAAAAA 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1897 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

CGGGGGGCTG GGACCTGGGG CGTAACCGTC TCTACCACGA CGGCAAGAAC CAGCCAAGTA 60 

AAACATACCC AGCCTTTCTG GAGCCGGACG AGACATTCAT TGTCCCTGAC TCCTTTTTCG 120 

TGGCCCTGGA CATGRATGAT GGGACCTTAA GTTTCATCGT GGATGGACAG TACATGGGAG 180 

TGGCTTTCCG GGGACTCAAG GGTAAAAAGC TGTATCCTGT AGTGAGTGCC GTCTGGGGCC 240 

ACTGTGAGAT CCGCATGCGC TACTTGAACG GACTTGATCC TGAGCCCCTG CCACTCATGG 300 

ACCTGTGCCG GCGTTCGGTG CGCCTAGCGC TGGGAAAAGA GCGCCTGGGT GCCATCCCCG 360 

CTCTGCCGCT ACCTGCCTCC CTCAAAGCCT ACCTCCTCTA CCAGTGATCC ACATCCCAGG 42 0 

ACCGCCATAC GACAGCCATC TGGTGCCAAR TCACTGAGCC CGTTGGGGTC CGCCGACCCC 480 

TGCGCCTGGG ATGGAAGCCC ACCTCAGCCA TGGGCAGACG TGCCCCCTCA TCCTACCGGC 540 
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TGCCTCTGCT GGGGGAACCT ATGCCAACGG ACTTCTCCCT TCCCAACACT GGCTGAAGCA 600 

GCAGCACCCA GGCCCTTCCC TGAACCAGAT GCAGAGAATA AACTATG7VAA ACCTCTCTCA 6 60 

GGCGCCTTCT GCTCTCAGGT GGAGTGGGCT GCCCCCCACT CTCTGCAGAG AGAGGCTACA 720 

CCCACCTGGG GGGTCCTGGG AGGTAAGACT AGTAGGAGGT GCCAGGGCTG ARTCCAAAAG 7 80 

CAGGAATGGC CAGGAMCAGG CCATACAGAT GAAGCTCAGG ATGTCACATA CCATGGACAM 840 

TGAGACAGAA CCCCAGGTTG GAMTTCCCTT GGGCCAACGA GTGCCAGCTT TAATGTCAGC 9 00 

TGCMGGTGCT CTGTGGCCTG TATTTATTCT TTAAACAGTA GCAAAGGCCA TTTATTTATT 9 60 

CCACTTAGAA AGGAAACCTT GGTGGGTGGY TTCCCTCGAT GTGCTTTCCC CCACCTCCCT 1020 

GGAATGTGTG TGCCACACCT GTCCTTGTCC CAGGCCAGGA CTGTGGCACA TGAGCTGGTG 1080 

TGCACAGATA CACGTATGTC GTCGTGCATG ACCCCTGACT AGTTCCTAAG TAGCCCTGCA 1140 

CCAAGCACCA GAGCAGACCC CAAGAGAGGC CCGTGCAAGT CCCCATGTCC CCAGGTCCCT 12 00 

GCTTCTGTTG CCTTGGGACT CATACACCGG CACACGTGTT TCAGCCTCTT GACTTCCATG 12 60 

AGCTTCGAAT TTTGCCCCCG ATTCTTCTGA TATTTCCCAT TGGCATCCTC CAAAGCTCTG 1320 

GGCCTGGAGG GCATTAGGAC ACATGGAATG AGTGGGGTCT CCAGCCCCTG GGAAAGCCAC 13 80 

TGGCAAGGCA GGATTAGAAA GACCAAGAGC AGGGTGGGGC GCCATGAAGC CTGTATGCCT 1440 

CTCAGGCTCA AGACCCCGCC ACACACCCAC TCAAGCCTCA GAAGTGGTGT GTAGGGCAGC 1500 

CCCAGGAGAG GAATGCCTGT CCTAGCAGCA CGTACATGGA GCACCCCACA TGTGCTCCAG 1560 

CCCTCTGGCT GTTTCTCTTG CTCTAGAATC AACTCCCTAC ATTGGGAATG TAGCCATTTG 1620 

GTAGAGGACT TGCCTAGCCT GCAGGAAGCT CACGTTCCAT CCCCTGCACC AAGGAGAATC 1680 

AAAGCTCAGG AGGCTGAGGC AGGAGGATTG CTGTCAGTGG TGTACAGAGG TCATGGCCAT 17 40 

CCTGGGCTAT ATTAAACCTT GTCCTTTAAG AAAAAGAAAA GAAATCAACT TCCATTGAAT 1800 

CTGAGTTCTG CTCATTTCTG CACAGGTACA ATAGATGACT TKATTTGTTG AAAAATGKTT 18 60 

AATATATTTA CMTATATATA TATTTGTAAG AAGCATT 1897 
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(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



Gly Gly Trp Asp 
1 

Gin Pro Ser Lys 
20 

lie Val Pro Asp 
35 

Lgu Ser Phe lie 
50 

Leu Lys Gly Lys 
65 

Cys Glu lie Arg 



Pro Leu Met Asp 
100 

Glu Arg Leu Gly 
115 



Leu Gly Arg Asn 
5 

Thr Tyr Pro Ala 



Ser Phe Phe Val 
40 

Val Asp Gly Gin 
55 

Lys Leu Tyr Pro 
70 

Met Arg Tyr Leu 
85 

Leu Cys Arg Arg 



Ala lie Pro Ala 
120 



Arg Leu Tyr His 
10 

Phe Leu Glu Pro 
25 

Ala Leu Asp Met 



Tyr Met Gly Val 
60 

Val Val Ser Ala 
75 

Asn Gly Leu Asp 
90 

Ser Val Arg Leu 
105 

Leu Pro Leu Pro 



Asp Gly Lys Asn 
15 

Asp Glu Thr Phe 
30 

Xaa Asp Gly Thr 
45 

Ala Phe Arg Gly 



Val Trp Gly His 

80 

Pro Glu Pro Leu 
95 

Ala Leu Gly Lys 
110 

Ala Ser Leu Lys 
125 



Ala Tyr Leu Leu Tyr Gin 
130 



) INFORMATION' FOR SEQ ID NO: 42: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 265 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

AAGGGTAAAA AACTGTATCC TGTAGTGAGT GCCGTCTGGG GCCACTGTAG ATCCGAATGC 60 

GCTACTTGAA CGGACTCGAT CCCGAGACTG CCGCTCATGG ATTTGTGCCG TCGCTCGGTG 120 

CGCCTGGCCC TGGGGAGGGA GCGCCTGGGG GAGAACCACA CCTGCCGCTG CCGGCTTCCC 180 

TCAAGGCCTA CCTCCTCTAC CAGTGACGTT CGCCATCATA CCGCCAGCGC GACAGCCACC 240 

TGGTGCCAAC TCACTGAGCC GCCTG 2 65 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2438 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

AAGTGGCGGC GGTCCCTGGA GAGCAGGCGG AGGCAGCGGC AAGTCTGACT CTGGGCTGAC 60 

CGTGGAGCCG GGGCGGGGGC TGACAGCCAG GCCTCCGCCT GGCGGGAGCC GCACGAGGAG 120 

CGGGAGTGGC CGGGCCTCTC TTCCGCGCTT GAGCGAGCGC CGGGTGATGG CGGTGGTGAT 180 

GGCGGCAGGC GCTCGGACAG CTCCGCTTGA GCTGAGCTCG GAGAGATCCG TCCAGAAAGT 240 
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GCCCAGAAGA AACTTCCTCT TAGAAAAGCT GAAAAACACA RTATTTATAA CACTGGAAAT 3 00 

TGTAAAGAAT TTGTTTAAAA TGGCTGAAAA CAATAGTAAA AATGTAGATG TACGGCCTAA 3 60 

AACAAGTCGG AGTCGAAGTG CTGACAGGAA GGATGGTTAT GTGTGGAGTG GAAAGAAGTT 420 

GTCTTGGTCC AAAAAGAGTG AGAGTTGTTC TGAATCTGAA GCCATAGGTA CTGTTGAGAA 480 

TGTTGAAATT CCTCTAAGAA GCCAAGAAAG GCAGCTTAGC TGTTCGTCCA TTGAGTTGGA 540 

CTTAGATCAT TCCTGTGGGC ATAGATTTTT AGGCCGATCC CTTAAACAGA AACTGCAAGA 600 

TGCGGTGGGG CAGTGTTTTC CAATAAAGAA TTGTAGTGGC CGACACTCTC CAGGGCTTCC 660 

ATCTAAAAGA AAGATTCATA TCAGTGAACT CATGTTAGAT AAGTGCCCTT TCCCACCTCG 72 0 

CTCAGATTTA GCCTTTAGGT GGCATTTTAT TAAACGACAC ACTGTTCCTA TGAGTCCCAA 780 

CTCAGATGAA TGGGTGAGTG CAGACCTGTC TGAGAGGAAA CTGAGAGATG CTCAGCTGAA 84 0 

ACGAAGAAAC ACAGAAGATG ACATACCCTG TTTCTCACAT ACCAATGGCC AGCCTTGTGT 900 

CATAACTGCC AACAGTGCTT CGTGTACAGG TGGTCACATA ACTGGTTCTA TGATGAACTT 960 

GGTCACAAAC AACAGCATAG AAGACAGTGA CATGGATTCA GAGGATGAAA TTATAACGCT 1020 

GTGCACAAGC TCCAGAAAAA GGAATAAGCC CAGGTGGGAA ATGGAAGAGG AGATCCTGCA 1080 

GTTGGAGGCA CCTCCTAAGT TCCACACCCA GATCGACTAC GTCCACTGCC TTGTTCCAGA 1140 

CCTCCTTCAG ATCAGTAACA ATCCGTGCTA CTGGGGTGTC ATGGACAAAT ATGCAGCCGA 1200 

AGCTCTGCTG GAAGGAAAGC CAGAGGGCAC CTTTTTACTT CGAGATTCAG CGCAGGAAGA 12 60 

TTATTTATTC TCTGTTAGTT TTAGACGCTA CAGTCGTTCT CTTCATGCTA GAATTGAGCA 13 2 0 

GTGGAATCAT AACTTTAGCT TTGATGCCCA TGATCCTTGT GTCTTCCATT CTCCTGATAT 13 BO 

TACTGGGCTC CTGGAACACT ATAAGGACCC CAGTGCCTGT ATGTTCTTTG AGCCGCTCTT 14 40 

GTCCACTGCC TTAATCCGGA CGTTCCCCTT TTCCTTGCAG CATATTTGCA GAACGGTTAT 1500 

TTGTAATTGT ACGACTTACG ATGGCATCGA TGCCCTTCCC ATTCCTTCGC CTATGAAATT 15 60 

GTATCTGAAG GAATACCATT ATAAATCAAA AGTTAGGTTA CTCAGGATTG ATGTGCCAGA 162 0 
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GCAGCAGTGA TGCGGAGAGG TTAGAATGTC GACCTGCATA CATATTTTCA TTTAATATTT 1680 

TATTTTTCTT ATGCCTCTTT GAATTTTTGT ACAAAGGCAG TTGAATCAAA TAAAACTGTG 1740 

CCCTAAGTTT TAATTCCAGA TCAATTTATT TTTTTTATGA TACACTTGTT ATATATTTTT 1800 

AAGCAGGTGT TTGGTTTTGT TTTTACCATA TAAATTTACA TATGGTCCAG GCATATTTAC 1860 

AATTTCAAGG CATTGCATAT ACATTTGAAT ATTCTGTATT TTTTAAATAA TCTTTTGTTC 1920 

TTTCCTATGT GTGAAATATT TTGCTAATCT ATGCTATCAG TATTCTTGTA TGACCGAATA 19 80 

GTTACCTATT CTCTTTTCAT CTTGAAGATT TTCAGTAAAG AGTGTTGTAA TCAATCCATT 2040 

ATAATGTAAT TGACTTTTGT AATTTGCCAA TAGGAGTGTT AAACAACAAA ATGATTTAAA 2100 

ATGAAACTTA ATGTATTTTC ATTTTAAATA TTAACTAAAC CAAGTTTGTT TGTTAGTTAT 2160 

TCTAGCCAAT AAGAAAAGAG AATGTAGCAT CCTAGAGGTG TATTTGTTCT GCAGTTTGGC 222 0 

AGGACCGTCA GTTAGTCCAA ATAAACATCC CCTCAGCGTG GAGGCGAATG GAACCTGTGC 22 80 

TCCTTTCTTA CGGGAAGCTT TGCAAAGCAA AATAGCAGGG TTACAAGCTT GGAGTTGTTA 2340 

AGGCAACTAG AGTTTTCTCT ATTAATTTAT AGACTGTTGT TGCACCTACT TAGCTCTTTT 24 00 

TTGGGAACTC TAGTTCCCAG GGGAAAATAC CTCGTGCC 243 8 
(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 542 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Ser Gly Gly Gly Pro Trp Arg Ala Gly Gly Gly Ser Gly Lys Ser Asp 
15 10 15 
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Ser Gly Leu Thr Val Glu Pro Gly Arg Gly Leu Thr Ala Arg Pro Pro 
20 25 30 

Pro Gly Gly Ser Arg Thr Arg Ser Gly Ser Gly Arg Ala Ser Leu Pro 
35 40 45 

Arg Leu Ser Glu Arg Arg Val Met Ala Val Val Met Ala Ala Gly Ala 
50 55 60 

Arg Thr Ala Pro Leu Glu Leu Ser Ser Glu Arg Ser Val Gin Lys Val 
65 70 75 80 

Pro Arg Arg Asn Phe Leu Leu Glu Lys Leu Lys Asn Thr Xaa Phe He 
85 90 95 

Thr Leu Glu He Val Lys Asn Leu Phe Lys Met Ala Glu Asn Asn Ser 
100 105 110 

Lys Asn Val Asp Val Arg Pro Lys Thr Ser Arg Ser Arg Ser Ala Asp 
115 120 125 

Arg Lys Asp Gly Tyr Val Trp Ser Gly Lys Lys Leu Ser Trp Ser Lys 
130 135 140 

Lys Ser Glu Ser Cys Ser Glu Ser Glu Ala He Gly Thr Val Glu Asn 
145 150 155 

Val Glu He Pro Leu Arg Ser Gin Glu Arg Gin Leu Ser Cys Ser Ser 
165 170 175 

He Glu Leu Asp Leu Asp His Ser Cys Gly His Arg Phe Leu Gly Arg 
180 185 190 

Ser Leu Lys Gin Lys Leu Gin Asp Ala Val Gly Gin Cys Phe Pro He 
195 200 205 

Lys Asn Cys Ser Gly Arg His Ser Pro Gly Leu Pro Ser Lys Arg Lys 
210 215 220 

He His He Ser Glu Leu Met Leu Asp Lys Cys Pro Phe Pro Pro Arg 
225 230 235 240 

Ser Asp Leu Ala Phe Arg Trp His Phe He Lys Arg His Thr Val Pro 
245 250 255 



SUBSTTTUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



- 169 



Met Ser Pro Asn Ser Asp Glu Trp Val Ser Ala Asp Leu Ser Glu Arg 
260 265 270 

Lys Leu Arg Asp Ala Gin Leu Lys Arg Arg Asn Thr Glu Asp Asp lie 
275 280 285 

Pro Cys Phe Ser His Thr Asn Gly Gin Pro Cys Val lie Thr Ala Asn 
290 295 300 

Ser Ala Ser Cys Thr Gly Gly His lie Thr Gly Ser Met Met Asn Leu 
305 310 315 320 

Val Thr Asn Asn Ser lie Glu Asp Ser Asp Met Asp Ser Glu Asp Glu 
325 330 335 

lie lie Thr Leu Cys Thr Ser Ser Arg Lys Arg Asn Lys Pro Arg Trp 
340 345 350 

Glu Met Glu Glu Glu lie Leu Gin Leu Glu Ala Pro Pro Lys Phe His 
355 350 365 

Thr Gin lie Asp Tyr Val His Cys Leu Val Pro Asp Leu Leu Gin lie 
370 375 380 

Ser Asn Asn Pro Cys Tyr Trp Gly Val Met Asp Lys Tyr Ala Ala Glu 
385 390 395 400 

Ala Leu Leu Glu Gly Lys Pro Glu Gly Thr Phe Leu Leu Arg Asp Ser 
405 410 415 

Ala Gin Glu Asp Tyr Leu Phe Ser Val Ser Phe Arg Arg Tyr Ser Arg 
420 425 430 

Ser Leu His Ala Arg lie Glu Gin Trp Asn His Asn Phe Ser Phe Asp 
435 440 445 

Ala His Asp Pro Cys Val Phe His Ser Pro Asp lie Thr Gly Leu Leu 
450 455 460 

Glu His Tyr Lys Asp Pro Ser Ala Cys Met Phe Phe Glu Pro Leu Leu 
465 470 475 480 

Ser Thr Pro Leu He Arg Thr Phe Pro Phe Ser Leu Gin His He Cys 
485 490 495 



SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



- 170- 

Arg Thr Val lie Cys Asn Cys Thr Thr Tyr Asp Gly lie Asp Ala Leu 
500 505 510 

Pro lie Pro Ser Pro Met Lys Leu Tyr Leu Lys Glu Tyr His Tyr Lys 
515 520 525 

Ser Lys Val Arg Leu Leu Arg lie Asp Val Pro Glu Gin Gin 
530 535 540 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

CCCTCTGGGC AAGCCGCCCC CCCCCCACCC ATCTACCACA CACACACACA CACACACACA 60 

CACACATTCA GACCTTGGGG CAAAAACAAA GCAAAATAAC AACAACAAAA ACACTGCCTG 12 0 

TGGAAAGTCC TTACTTCAGG AAGGTTGGCA GATGAGGAGC AAGGGAACAT TTTATCAGGA 180 

CTGCCACAAA GGAGTCTTTT TTTTTAATGG TTTTTCAAGA CAGGGTTTCT CTGTATAGCC 240 

CTGGCTGTCC TGGAGCTCAC TTTGTAGACC AGGCTGGCCT CGAACTCAGA AATTCGCCTG 300 

CCTCTGCCTC CTGAGTGCTG GGATTAAAGG CGTGCAGCAC CATGTCCAAC TGGCATTTTC 3 60 

TCAATTAAGG TTCGTTCCTT TCAGATAACT CTAGGTTCTG GGTCAAGCTG ACACAAGGCT 42 0 

ACACAGCACA GTTTGTATGC CACATTCAGT TCAGAAGACA CCCAACCTCC CTGGAACTGG 4 80 

AACTTATGCA CATTTGTGAG CTTCCACTTG GGAGTGGGAA CCTGAACTGG GTCCTCTGCA 540 

AGAGCAGCCG TGCTCTTAAC TGCTGAGCCA TTTCAGCAGC CTCACATCAG AATTAAGTTA 500 
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GAAATTAGCCG GGTATGAATC ATACCCTTAG AATCCTAGCA TCTGAAAGCA GAGCTAAGAG 
660 

AAACAGGGAT TCAAGACCAG CTCTTGGCTA CAGAGCCCGT CCTGTCCTAG GATGGGCTAC 720 

AAGAGACTAT TTCAAAGCCA TCCAT^CAAC AATAACTACA ACAACAACAA GGTTAAAATT 7 80 

AGGCTGGGCA CAGGGTACAC ACCTTTAATG CCAACACTCA GGAGGCAGAG GCAGGCTGAT 840 

CAGTGTGAGT TTGAGTTCAA CGTGGTCTAC ATAGGGAGTT CTAGGCCAGC AGAGGTTACA 9 00 

GTCTCTCTCT CTCTCTCTCT CTCTCTCTCT CTCTCACACA CACACACACA CACACACACA 960 

CACACACACA CACACACGGT GGCATTATGG GATTTTTTTG GGATAAGGTT TCTCTGTCTA 1020 

GCCCTGGCAT AGATTCACTC TGTAGACTAG GCTAGCCTTG AACTCAGAGA TCCGCCTGCC 1080 

TCTGCCTCCC AAGTGCTGGG ATTATAGGTG TTGCACCACC ACTGCCCAGC CACTTTGGGA 1140 

TTTTTGAACT GTTATCAAGA GGCTTTCGAG GAGGTCAAAC TTCAACAGCA ACCTCTCCAT 12 00 

GATAATGTAG CTAATGATCA AACGACACTC AAAACTTAAC CCTTAAAGCA CACATCCACC 12 60 

AGACAGCGTG CCCACTCGTA GTTCCATTAC TCAGGAGGCT GAAGCAGGAG GATGAAGGAC 1320 

TAAGGCTTCA GCAACCTAGG GAGCCGCAGG GGACAGTAGT CTCAATCCCT ACATTCTCCT 1380 

GAACACAGGA GCAGGAGTTC AGGAAGGGTG TCAAGGCCGC TTACTGATCT TAGGGCCTCA 1440 

GGAATGACTA GCTCAGGCAG AGAGAGCAAA GGTCTCCAGT GGAGAAGTCT ACACACACAC 1500 

ACACACACAC ACACACACAC ACACACACAC AGAATCCAAG GCGATGACGT CATCAAAGGG 1560 

TTAATTCTAG TCTGGGATGG GGGGGAGGGT GGGGCACGCA GCTGTCAGGT GGCTTTGGAA 1520 

AAATAAACTG CTGAAGAGTC TGACGCCAGG GAGTCCTGGG AGGGACAAGA GGTTACCCAC 1680 

TCAAAGAGTG TGCTCCACAA AGCATGCGCG CTTGTCCACG TCTGGAGTCG TCACTTATTT 1740 

TTTGCCTGGA TTCTTTGTAG CCGGTGGGTT CTCAAGGCGG TAAGTGGTGT GGCCGCCGTG 1800 

GTCTGGGAGG TGACGATAGG GTTAATCGTC CACAGAGCCC AGGGGCGGAG CGCGGGCGGG 1860 

CGTCCGCAGC CCCGCTGGAG CCGGAAGCAG TGGCTGGTCA GGGGCGCTTC TAGCCTTCCC 1920 
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TATCTGTACT TCCACAGAGG TCTCTGCGAG CTAGGGGGAC AGTGAGGTGC GGGGTAGGGG 1980 

CCCGGCGTTA GAGCCAGCAA GGGGACGGTT CACGGTAAGG TCTGAGGGAG AGAGAGCTCC 2 040 

TGAGAAACTT GGGGGGCGCG ACACAGATAG GGTGAAAGCA GAGTGATAGA CCTGGGATGG 2100 

TTAGGGGACC AAGGGAAGAC CAGGCTGGTT GGCATACACC GGTGAACGGA TGGGAGTCCT 2160 

AGGGAAAGAT GATGCGCCTA ACAGTCCTTT CTGTCTCCAC ACCACTCCAG GGGACGATCC 2 22 0 

GGAGCTCAAC TTTCAAAAGC GAGACGCCCC AGCAAGCCTG TTTTGAGAAG TTCTTCAGCG 2280 

GCTCTCCTCA TGGGCCAGAC GGCCCTGGCA AGGGGCAGCA GCAGCACCCC TACCTCGCAG 2 340 

GCTCTGTACT CGGACTTCTC TCCTCCCGAG GGCTTGGAGG AGCTCCTGTC TGCTCCCCCT 2 400 

CCTGACCTGG TTGCCCAACG GCACCACGGC TGGAACCCCA AGGATTGCTC CGAGAACATC 2 460 

GATGTCAAGG AAGGGGGTCT GTGCTTTGAG CGGCGCCCTG TGGCCCAGAG CACTGATGGA 2 520 

GTCCGGGGGA AACGGGGCTA TTCGAGAGGT CTGCACGCCT GGGAGATCAG CTGGCCCCTG 2580 

GAGCAAAGGG GCACACACGC CGTGGTGGGC GTGGCCACCG CCCTCGCCCC GCTGCAGGCT 2 640 

GACCACTATG CGGCGCTTTT GGGCAGCAAC AGCGAGTCCT GGGGCTGGGA TATTGGGCGG 2700 

GGAAAATTGT ATCATCAGAG TAAGGGCCTC GAGGCCCCCC AGTATCCAGC TGGACCTCAG 27 60 

GGTGAGCAGC TAGTGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GGAGGGGACT 282 0 

CTTGGCTACT CTATTGGGGG CACGTACCTG GGACCAGCCT TCCGTGGACT GAAGGGGAGG 2 880 

ACCCTCTATC CCTCTGTAAG TGCTGTTTGG GGCCAGTGCC AGGTCCGCAT CCGCTACATG 294 0 

GGCGAAAGAA GAGGTGAGAT ACGGACTAGG TGTGGGGAGA TCACTACTCT TGGCAATGGT 3 000 

TTGGGCTGGA AACTCATGGT TGGAGCACAG GAAGTAGGCT TCTTGTCACT TTGGCCTGTC 3 060 

ACTTAGATGG CCTTGGATCT AGCTTCACTC CCAATCCCTA TTGGATGTGA TGCACAAATT 312 0 

CAGAGCCTTT GGGTCTCCCT CAGCTGAGGT GGCGGTGGAA ATGGAGGAAG AAGGAAGGGT 3180 

GCCTGAGCAG GATCTCAAGT TCAAGGATGC CTGGAGTTGC TTACTTACCT TGTCTTCCTT 32 40 

CTCTCTCCGC AGTC3GAGGAA CCACAATCCC TTCTGCACCT GAGCCGCCTG TGTGTGCGCC 33 00 
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ATGCTCTGGG GGACACCCGG 

AGCGCTATCT GCTCTACAAA 

GACAGGTGGA GAGGCACCCG 

GGGGGGCTGG ACCCCTTCAC 

AAACACCATG GCAGCCTGGG 

CAACCTGTTT CATTATTGTT 

CCATCACTGT CTTAAGGAAT 

CTTACATGTA GGATGGTTCA 

CTATCCCAGG CCTCTTAGGG 

CTGCATCAGA CATCCAGTAG 

AGGCCAAAGT GACACGAAGC 

AATGGAAGGT GATTTCACTT 

CACTAGGAGC CACCTTGGTG 

CTGGTCCCAA CCATAATAGG 

CACAAGATGG GGCAGATGAT 

TTCTGGGCAA CCTAGTCCAT 

TGGCATTGAT GATGTCCACA 

CCACGTCAGG CTGGCTTGCC 

GAACAAGAAG ACAGTTTGGT 

CAAGGCAGCC TCAGTCTGTC 

CTCCATAAAT GATCCGGGTG 

CCCGGAGCTT CTCGTGTACT 

TCCTTCCTAT CTATCATTCA 



- 173- 

CTGGGTCAAA TATCCACTCT GCCTTTGCCC CCTGCCATGA 3 3 60 

TGACCCAGTA GTACAGGGTG TGCTGGCACC CTACCGTGGG 3 420 

CTGGCCTAGA CAACTTTAAA AAGCTGGTGA AGCTGGGGGG 3 480 

CTCCCCTTCT CACAGGAGCA AGACATATAG AAATGATATT 3 540 

ACAAAGAGGT TTTTGAAGTA AAAAATGAGA TGTATTGTCA 3 600 

TTTTGTTTTG TTTTACACTC CCCCACCCCA GGCTAGAGCC 3 660 

TATGACAACC CACAAAGCTC AGGCCCAGGT GTTTATTTCC 3720 

CAAACACAAT ACAGGGGCTT TGGCACCGTG GGGGAGGGGA 3 780 

TCTCATGTAT ACCGAATTCA GACCCGAAAG CTCTGAATTT 3 840 

AACTTGGGAG TGAAGCTAGA GCCAAGGCCA TCTAAGTGAC 3 900 

CCACTTCCTG TGCTCCAACC ATGAGTTTCC AGCCCAAACC 3 960 

GTCAGGGCCC AAAGGGACAG TCAGTTCTAC TCCCTCCCCT 4020 

ACAGTTGATT CTACCCACTG TAAGTGGTAA AGGGATTGGC 4 080 

GCGGTGGAAA CGGCTCAGGA GGGTACAGCG TGGATTAGGC 4140 

GTCATCAGAA GCATGTGACC GGTGGGAGCA GTTACTAAAC 42 00 

GCTATGCAGG CAGGTAGAGG GATGGGCAGT GCTCATTGTT 42 60 

AATTCAGGCT TGAGAGATGC GCCACCCACA AGGAAGCCGT 4320 

AGCTCTTTGC AGGTTGCTCC AGTCACAGAA CCTGTACCAG 43 80 

CAGGTCTATG ATCAGAACAC TTAAGCCCCA CCTCTCTGTG 444 0 

TTAGCCCATT TCCGTCTTAG CTAGAGCCAA AGCCACTCAC 4500 

CTCTGAGCCA CCCCATCATT GACATTGGAT TTCAGCCATC 4560 

TCCTGTGCCT AGAAGGAGGA GGCAGAGCTA CTAAGTAAGC 4 620 

AGGAGTAAAA ACCACTGGTT CTCACATAGA GTTGAGTTTC 4 680 
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CAGAAAAGCC CCGGGACCAG AGAGTGGCAA GGCTCCAATC CCACCAGGCT TGGAATGAAC 
ATTTTTGGCA AAGTCACTCT CCTTGGTGAG TTTGGGGGCC CTCTGTCTCT AAAGGGGCTT 
GGATGGGCTC CATAGCTGTG TGAGTCTGTT AAAGCCGGAC AGGCTGAGGA GCTCTGGGTA 
GTTACCTGCT GAGGGGTTGC CGTCTTGCCA GTCCCAATGG CCCACACAGG TTCATAGGCC 
AGGACCACCT TGCTCCAGTC TTTCACATTA TCTGTGGGGC AGAGAGGAGA GTGAGTAGGA 



AGGAGCTGAC CCGCCAAGC .oon 
(2) INFORMATION FOR SEQ ID NO: 46: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 64 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



Met Gly Gin Thr 
1 

Gin Ala Leu Tyr 
20 

Leu Ser Ala Pro 
35 

Asn Pro Lys Asp 
50 

Cys Phe Glu Arg 
65 

Lys Arg Gly Tyr 



Leu Glu Gin Arg 



Ala Leu Ala Arg 
5 

Ser Asp Phe Ser 



Pro Pro Asp Leu 
40 

Cys Ser Glu Asn 
55 

Arg Pro Val Ala 
70 

Ser Arg Gly Leu 
85 

Gly Thr His Ala 



Gly Ser Ser Ser 
10 

Pro Pro Glu Gly 
25 

Val Ala Gin Arg 



lie Asp Val Lys 
60 

Gin Ser Thr Asp 
75 

His Ala Trp Glu 
90 

Val Val Gly Val 



Thr Pro Thr Ser 
15 

Leu Glu Glu Leu 
30 

His His Gly Trp 
45 

Glu Gly Gly Leu 



Gly Val Arg Gly 
80 

He Ser Trp Pro 
95 

Ala Thr Ala Leu 
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100 105 110 

Ala Pro Leu Gin Ala Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 

Glu Ser Trp Gly Trp Asp lie Gly Arg Gly Lys Leu Tyr His Gin Ser 
130 135 140 

Lys Gly Leu Glu Ala Pro Gin Tyr Pro Ala Gly Pro Gin Gly Glu Gin 
145 150 155 160 

Leu Val Val Pro Glu Arg Leu Leu Val Val Leu Asp Met Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ser lie Gly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 185 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ser Val Ser Ala Val Trp Gly 
195 200 205 

Gin Cys Gin Val Arg lie Arg Tyr Met Gly Glu Arg Arg Val Glu Glu 
210 215 220 

Pro Gin Ser Leu Leu His Leu Ser Arg Leu Cys Val Arg His Ala Leu 
225 230 235 240 

Gly Asp Thr Arg Leu Gly Gin He Ser Thr Leu Pro Leu Pro Pro Ala 
245 250 255 



Met Lys Arg Tyr Leu Leu Tyr Lys 
260 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5615 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

GTACTTTCTT TATATCTCCA TAATTTTATT TACTATTACT ACATGATACA TTATTTTATA 60 

AAAGTCTTTG TAACCTCCTT AAGGATTCAC TGCTTAATCT CCAGTGCTTA GCACAAATCA 12 0 

TTAAATGCGA ACCAGAAACT CTTCCAAATG TGTTACATCT ATAACCTCAT TGGATTCTCA 18 0 

CTACCAACCC CATGCAATAG ATACTAATGT GATCTCTGTC TTACAGAGGA AGAAACAGGC 24 0 

ACAGGGAGGT TCAGTAATTT GCCCAAGGTC ATACACACAC TGGCCTTCAG GTATTCATGC 3 00 

CCGGGGAGTC TGGTCCCACA GCTGGCATGT TTGCCATTAT ATTATATTGC CTCCTTATAG 360 

TGTCGGCACT CATTAAGCAC ATTGACAGCT ATGCTTGGTG AGTGACTACT ATGTACCCAG 42 0 

CTCTGTGCTA CATGCTTTAC CTGGATTATT TCAACTGCAC AACAACCCTG TGAGGTAACT 480 

ACCATCATTG CTCCTATTTT ACATAACAGA AAACTACAGA AATCTGGGGC TGGGCGTAGT 540 

GGCTCATGCC TGAAATCCCA GCACTTTGGG AGACCCTGTC TCTAAAAAAA ATTTTTTTTT 600 

GGCCGGACGT GGTGGCTCAC ACCTGTAATC TCAGCACTTT GGGAGGCTAA GGCAGGCAGA 660 

TCACAAGGTC AGGAGTTCTA GACCAGCCTG GCCAACATGG CAAAACCCTG TGTCTACTAA 720 

AAATACAAAA AATAGCTAGG CGTGGTGGCA GGTGCCTGTA ATCCCAGCTA CTCAGGAGGC 7 80 

TGAGGCAGGA GAATCCCCTG AACCTGGGAG ATGGAGGTTA CAGAGAGCCG AGATCGTGCC 840 

GCTGCACTCC AGCCTGGGCA ACAAGAGCAA GACTCTGTCT CGAAAAAAAT AAAAATAAAA 900 

ATAAAAATAT TTTTTTAAAA ATTAGCTGGG TGTGGTAGCA CATGCCTGTA GTCCCAGCTA 9 60 

CTTGGGAGGC TGAGGTAGGA GGATCACTTG AGCCCAGGAG GTCAAGGCTG CAGTGGGCTG 1020 

TGATGGCGCC ACTGCACTCT AGCCTTGGTG ACAGCAAGAC CCTGTCTCAA AAAAAAAAAA 1080 

AAGAGAAATC GGGCAACTTC CCCAAGATCG CGCAGTTAAC TAGTGGCATA GCTTCACTCA 114 0 

AACTCGAAGT CTTAATCAGG ACACTCTACC AAATGAGATC AACGGCTCAG TAATGGATTG 1200 

GCATCCAGTA TGAAGACTGG ACCAGCAGGG AGAACTATGA TGCGTACAGC CTAGAGCCTG 1260 

AAGCAGATTT CACAGCCTCA GAGGTGGCAC AGGCTGACTC ACAACCCGGG GCAGAAAGGG 132 0 
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ACCAGCCCAG AAACAGTGAC CCAGAATCAC AGGGAAGTAG AAATGGGATT CGGCACAATG 1380 

AAGCCCCTCC TTGACCCCAT GCTCCTTACC CTCAGGGGCG CAGGAGTTAG TCGCTCAGGC 1440 

GGCTCAAAGG TCTTGACGGT GGAGAACACC ATCCCCAGGG ATTCCCGACG CGGTGATGCC 1500 

ATCAAAGCGT TAATTCTGAG ATGGGCCTGC CCGGGTGCGG ACTCTGCCGC AGCAAGAGAA 1560 

GGGTTAACTG CCCCGGGCCT TCGCCGTGGG GGCGGGGCCT CGGGGAGGGT CACAGCCCGG 162 0 

GACTGAGACC CGAGGTTAAC CGCCCGGGGT GGGCTCCACG GGGGCGGGGC ATGCTCTCCG 1680 

CGGCTGCTGC CGGTATAGAG CGGTAACTGC CCAGGAGGGG GCGGGGCCCC ACAGGGGCGT 1740 

GGCCTCGGAG CTGCACGGCC GTGGGCGGCG ATGAGAGGGT TAAGCCCCAG AGGGCCCTGG 1800 

AGGGGCGGGG CCGCGGGACG GGCTCGGCCC AAGGGAGGAG CTGGGGGCGG AAGCGGCCGG 1860 

CGGTCTGCGC CCTGCGCGCC TCGGCTTCTT TCCGCCCGGC TCCTTCAGAG GCCCGGCGAC 192 0 

CTCCAGGGCT GGGAAGTCAA CCGAGGTTCG GGGGCAGCGG CGAGGGCTCC GGGCGAGTAA 1980 

GGGGGATGGT CCATGCTGAG GCCCAAATGG GGCGAACTCG CGAGAGTCTC TGGCGACCTG 2 040 

GATCAGATGG GGCGAGGGCA GATGAAGGGC CCAGGAGCTT TGGGGCAGCG AGGAGGGAGG 2100 

AGCGGGCCCG TTGGCAAACT TGGGTGAAAG GATGGGGTAC CTGGGTGACG AGCCCCCGCC 2160 

AGGATTCTGC TCTTCACGCC CCTTTTCTCC CAGCTCCCTT CCAGGTCAAT CCAAACTGGA 2220 

GCTCAACTTT CAGAAGAGAA AGACGCCCCA GCAAGCCTCT TTCGGGGAGT CCTCTAGCTC 22 80 

CTCACCTCCA TGGGCCAGAC AGCTCTGGCA GGGGGCAGCA GCAGCACCCC CACGCCACAG 2 3 40 

GCCCTGTACC CTGACCTCTC CTGTCCCGAG GGCTTGGAAG AGCTGCTGTC TGCACCCCCT 24 00 

CCTGACCTGG GGGCCCAGCG GCGCCACGGT TGGAACCCCA AAGACTGTTC AGAGAACATC 24 60 

GAGGTCAAGG AAGGAGGGTT GTACTTTGAG CGGCGGCCCG TGGCCCAGAG CACTGATGGG 2 5 20 

GCCCGGGGTA AGAGGGGCTA TTCAAGGGGC CTGCACGCCT GGGAGATCAG CTGGCCCCTA 2 580 

GAGCAGAGGG GCACGCATGC CGTGGTGGGC GTGGCCACGG CCCTCGCCCC GCTGCAGACT 2640 

GACCACTACG CGGCGCTGCT GGGCAGCAAC AGCGAGTCGT GGGGCTGGGA CATCGGGCGG 27 00 
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GGGAAGCTGT ACCATCAGAG CAAGGGGCCC GGAGCCCCCC AGTATCCAGC GGGAACTCAG 2760 

GGTGAGCAGC TGGAGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GGAGGGAACT 282 0 

CTGGGCTACG CTATTGGGGG CACCTACCTG GGGCCAGCAT TCCGCGGACT GAAGGGCAGG 2880 

ACCCTCTATC CGGCAGTAAG CGCTGTCTGG GGCCAGTGCC AGGTCCGCAT CCGCTACCTG 2 940 

GGCGAAAGGA GAGGTGAGGC CTGGGGCAGA CGTGGGGAGA ACTTTCTGTC CCTGGTGGCA 30 00 

GTGGTTTGGG ATGGAAACTC TTCTGACAAG AGCAGAGGGG ATGGACCTTC ATCCAGCCTG 3060 

CCTCAACCTC TGTTCAGTGC TGGGAAAGGC TAGGGGTCTT CACAGCTGTT ATTTAATTTA 312 0 

ACCCAACAGC AATAGAGGTG AAACAGGCTT GAGAAAGCAA CTTTCTCAAG TTCTCTTGGC 3180 

CAGTAAATGG TGAACCTTCA GAATGGAGGG AGGAACTGCA GGGATGAGAG AATTCAGGAG 324 0 

ATATCAACCC CTGAGCAAGA GGTGCAAAGC GTTAGGTACT GGGTTTGATG TACAGGTCCA 33 00 

AAAGAAGGAT GGGCAGAGCC AGGTACCCAG GCTGTATACC GGATTCCCTG GGCTCTAACC 3360 

TGTCTCTGTG CCACATACCT ACTTCCTTCC TCAGCCACAC CTCTGGATGG AGACACTGGG 3420 

GCCCTGGGCA CCAGGGAGGA GAGCAGTGGA GGAGGCAGGG CCTTAGGGTG GGGCAGCAGG 3 4 80 

GGAGGAGCCT CCCCAGGAAC TGACTGGGTC CAGGGCTTGG AGCTGCTCTC TGCAGTTGTG 3540 

TGGGCTGTAG AGTGGAGGGC CATCCCTCCT CACCTCAGCC CCAGCTCCCA AGCCTCTGGA 3 600 

GTCAAAGCCT GGGCCAGCTC CACCACTGTC AGAGCCACCT TGGCCTGTTG TTTAGAGGGC 3 660 

CTTAGCCAGC TCTTCACCCC CAGCTCTGAC TAGGGATGTG TGAAATCTTA TCTGGGAGGC 3720 

AGAACTTCCG GGTATCTCAA ATTCCCCTTT CAGCCAGGTG GGCACACTCG AAGCAGGAAA 37 80 

GCAGAAAGGC ATCTGAGTAG GACCCCGTAG TTTGAGGACA TCTGGCTGGT GGCTGCACCC 384 0 

ATACTTACAT TCCCCTCCTT CTCTCTCCCA GCGGAGCCAC ACTCCCTTCT GCACCTGAGC 3 900 

CGCCTGTGTG TGCGCCACAA CCTGGGGGAT ACCCGGCTCG GCCAGGTGTC TGCCCTGCCC 3 960 

TTGCCCCCTG CCATGAAGCG CTACCTGCTC TACCAGTGAG CCCTGTGATA CCACAGACTG 4 02 0 

TGCTGAGGTC TTGCCACCAC CCCTCCCCTT GGGGAGGTGG GGAGGCACTG CTGGCCTAGA 4080 
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CCAGCTGCTG AAAGCTGGTG AGGCTGAGCC CCTACCCCAA CCCAAGCTCT GCGGAAATCA 4140 

ACAGCCCCAG AGCCACTTGG AGGGAGGAAG AAAGGGAGCC GGCGTTCAAG GCTATGACAG 42 00 

TCTGCTACGC AAAACATTTT TTCAAGTAAA AATAGTAAGA GATGTTGTTA TAGAAACCTG 42 60 

TTCTTGTTTT tTTTTTTTTC TTGCACAAAT GATCATTTAT ATAGCTGCCT CAAAAAGGAA 4320 

GATTATCTGG GCAAGTCCAG TGAAGGCAGA CAAACCACAA GACCTAGTGC CAGGTTTATT 43 80 

CCCTCACATG GGTGGTTCAC ATACACAGCA CAGAGGCACG GGCACCATGG GAGAGGGCAG 4440 

CACTCCTGCC TTCTGAGGGG ATCTTGGCCT CACGGTGTAA GAAGGGAGAG GATGGTTTCT 4500 

CTTCTGCCCT CACTAGGGCC TAGGGAACCC AGGAGCAAAT CCCACCACGC CTTCCATCTC 4 56 0 

TCAGCCAAGG AGAAGCCACC TTGGTGACGT TTAGTTCCAA CCATTATAGT AAGTGGAGAA 4 620 

GGGATTGGCC TGGTCCCAAC CATTACAGGG TGAAGATATA AACAGTAAAG GAAGATACAG 4680 

TTTGGATGAG GCCACAGGAA GGAGCAGATG ACACCATCAG AAGCATATGC AGGGAAAGGG 4740 

CAGTTACTGG GCTTCTGGGC TGCTTAGTCC CTGGCTTGGC AGGAAGGGTA GGGAAGATGG 4 800 

ATGGGGCTCA TTGTTTGGCA TTGATGATGT CCACGAATTC GGGCTTGAGG GAAGCACCAC 4860 

CCACAAGGAA GCCATCCACA TCAGGCTGGC TGGCCAGCTC CTTGCAGGTT GCCCCAGTCA 4 92 0 

CAGAGCCTGG GAAGGGAGCA GAACAAGGGC TTGGTCAAGA ATGGGATGAG TCTGCCCCAT 4980 

CCCCACCTCC ATGTCCGAGG GCTCAGTCTA GTCCTCAGCC CACTCCACCT CAGCCGGGAA 5040 

CCAAAGCCAC TCACCTCCAT AAATGATACG GGTGCTCTGA GCCACCGCAT CAGAGACGTT 5100 

GGACTTCAGC CATCCTCGGA GCTTCTCGTG TACTTCCTGG GCCTAGAACA AGAAGCTGGC 5160 

CTAAGTAAGA CCTTTTCTGC CTCTCTAAGA GGAAAAATCA CTGGCACCAG TGGACACTTA 5220 

GTGTGGTTTC TGACTGAGTC AGAGTACCAG GGCTCTGATC CAAGCCAGGC CCTGGACTGG 52 80 

ATGCCCTTGG ACAAGTCACT GTCTCTGGGT TCAAGGTCTC TGTGTCTTTG AAATAAGGGG 5340 

TTGCCCCATG TGGGCTGTGT CTGTCCAAAC CTATTGAGGC AGGCTGGGAT GAGGGCAGGG 54 00 

CTCCTGGGCC CGGTTACCTG TTGGGGTGTT GCAGTCTTGC CAGTACCAAT GGCCCACACA 54 60 
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GGCTCATAGG CCAGGACGAC CTTGCTCCAG TCCTTCACGT TATCTGCAGG GCAGAGATAC 552 0 

AGATGGAGGG AAGGGTGAAC AAGAAAGAGC TCTCCAGCCA GGTTCTCCGG AGTACGAAGA 5580 
ACGGTGGCCT ACTGCCCCCT AGTGGACATT GGGGG 5 51 5 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 63 amino acids 

(B) TYPE: amino acid 

(C> STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Met Gly Gin Thr Ala Leu Ala Gly Gly Ser Ser Ser Thr Pro Thr Pro 
15 10 15 

Gin Ala Leu Tyr Pro Asp Leu Ser Cys Pro Glu Gly Leu Glu Glu Leu 
20 25 30 

Leu Ser Ala Pro Pro Pro Asp Leu Gly Ala Gin Arg Arg His Gly Trp 
35 40 45 

Asn Pro Lys Asp Cys Ser Glu Asn lie Glu Val Lys Glu Gly Gly Leu 
50 55 60 

Tyr Phe Glu Arg Arg Pro Val Ala Gin Ser Thr Asp Gly Ala Arg Gly 
65 70 75 80 

Lys Arg Gly Tyr Ser Arg Gly Leu His Ala Trp Glu lie Ser Trp Pro 
85 90 95 

Leu Glu Gin Arg Gly Thr His Ala Val Val Gly Val Ala Thr Ala Leu 
100 105 110 

Ala Pro Leu Gin Thr Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 
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Glu Ser Trp Gly Trp Asp lie Gly Arg Gly Lys Leu Tyr His Gin Ser 
130 135 140 

Lys Gly Pro Gly Ala Pro Gin Tyr Pro Ala Gly Thr Gin Gly Glu Gin 
145 150 155 160 

Leu Glu Val Pro Glu Arg Leu Leu Val Val Leu Asp Met Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ala lie Gly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 185 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ala Val Ser Ala Val Trp Gly 
195 200 205 

Gin Cys Gin Val Arg lie Arg Tyr Leu Gly Glu Arg Arg Ala Glu Pro 
210 215 220 

His Ser Leu Leu His Leu Ser Arg Leu Cys Val Arg His Asn Leu Gly 
225 230 235 240 

Asp Thr Arg Leu Gly Gin Val Ser Ala Leu Pro Leu Pro Pro Ala Met 
245 250 255 

Lys Arg Tyr Leu Leu Tyr Gin 
260 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
AGCTAGATCTGGACCCTACAATGGCAGC 28 



(2) INFORMATION FOR SEQ ID NO: 50: 



(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AGCTAGATCT GCCATCCTAC TCGAGGGGCC AGCTGG 
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CLAIMS: 

1. A nucleic acid molecule comprising a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homologue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42°C 
wherein said protein comprises a SOCS box in its C-terminal region. 

2. A nucleic acid molecule according to claim 1 wherein the protein further comprises a 
protein: molecule interacting region. 

3. A nucleic acid molecule according to claim 1 wherein the protein:molecule interacting 
region is located in a region N-terminal of the SOCS box. 

4. A nucleic acid molecule according to claim 2 or 3 wherein the protein:molecule 
interacting region is a protein:DNA binding region or a protein:protein binding region. 

5. A nucleic acid molecule according to claim 4 wherein the protein molecule interacting 
region is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 

6. A nucleic acid molecule according to any one of claims 1-5 wherein the SOCS box 
comprises the amino acid sequence: 

X| X2 X3 X4 Xj Xg X7 Xg Xg X,o X,, X,2 Xi3 X,4X,5 X,g [XjJn X]7 X,g X,9 X20 

X21 X22 X23 [Xj]„ X24 X25 Xjg X27X2g 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P.TorS; 

X4 is L, I, V, M, A or P; 

X5 is any amino acid; 

Xg is any amino acid; 
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X7 is L, I, V, M, A, F, Y or W; 

Xg is C, T or S; 

X9 is R, K or H; 

X,o is any amino acid; 

X|, is any amino acid; 

X,2 is L, I, V, M, A or P; 

Xi3 is any amino acid; 

X,4 is any amino acid; 

X,5 is any amino acid; 

X,6 is L, I, V, M, A, P, G, C, T or S; 

[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X,7 is L, I, V, M, A or P; 
X,g is any amino acid; 
X,9 is any amino acid; 
X20 L, I, V, M, A or P; 
X2, is P; 

X22is L, I, V, M, A, PorG; 
X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M, A or P. 

7. A nucleic acid molecule according to claim 6 wherein the protein modulates signal 
transduction. 
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8. A nucleic acid molecule according to claim 7 wherein the signal transduction is modulated 
by a cytokine or a hormone, a microbe or a microbial product, a parasite, an antigen or other 
effector molecule. 

9. A nucleic acid molecule according to claim 8 wherein the protein modulates cytokine- 
mediated signal transduction. 

10. A nucleic acid molecule according to claim 9 wherein the signal transduction is mediated 
by one or more of the cytokines EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, 
IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSF. 

11. A nucleic acid molecule according to claim 10 wherein the signal transduction is mediated 
by one or more of IL-6, LIF, OSM, IFN-y and/or thrombopoietin. 

12. A nucleic acid molecule according to claim 1 1 wherein the signal transduction is mediated 
by IL-6. 

13. A nucleic acid molecule according to claim 1 wherein the nucleotide sequence encodes 
an amino acid sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 
8, SEQ ID NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ 
ID NO. 25, SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 
46 or SEQ ID NO. 48 or an amino acid sequence having at least about 15% similarity to all or 
part of the listed sequences or a nucleotide sequence which hybridizes to the nucleic acid 
molecule under low stringency conditions at 42" C. 

14. A nucleic acid molecule according to claim 1 wherein the nucleotide sequence is 
substantiaUy as set forth in SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, SEQ ID NO. 9, SEQ 
ID NO. 11, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID NO. 17, SEQ ID NO. 
20, SEQ ED NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ 
ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 
34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ ID NO. 40, SEQ 
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ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47 or a nucleotide sequence 
having at least 15% similarity to all or a part of the listed sequences or a nucleotide sequence 
capable of hybridizing to the listed sequences under low stringency conditions at 42°C. 

15. A nucleic acid molecule comprising a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homologue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42°C 
wherein said protein exhibits the following characteristics: 

(i) comprises a SOCS box in its C-terminal region wherein said SOCS box comprises 
the amino acid sequence: 

X, X2 X3 X, Xs X, X, Xg X, X,o X„ X,, X,3 X,,X,5 X„ [XJ„ X„ X,« X„ X20 
Xjj X22 X23 [Xj]„ X24 X25 X25 X27X28 



wherein: 



X2 
X3 
X4 
X3 
X5 
X7 
Xs 

Xo 



X 
X, 
X 



10 



12 



"13 



X,. 
X., 



M6 



s L, I, V, M, A or P; 
s any amino acid residue; 
s P, T or S; 
s L, I, V, M, A or P; 
s any amino acid; 
s any amino acid; 
s L, I, V, M, A, F, Y or W; 
s C, T or S; 
s R. K or H; 
s any amino acid; 
s any amino acid; 
s L, I, V, M, A or P; 
s any amino acid; 
s any amino acid; 
s any amino acid; 
sL, I, V, M, A, P, G, CTorS; 
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[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xpis L, I, V, M, AorP; 
X,g is any amino acid; 
X,9 is any amino acid; 
X20 L, I, V, M, A or P; 
X2, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xj]n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; 

is L, I, V, M, A or P; and 

(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or anicyrin repeats 
or other protein: molecule interacting domain in a region N-terminal of the SOCS box; 
and 

(iii) modulates signal transduction. 

16. An isolated protein or a derivative, homologue or mimetic thereof comprising a SOCS 
box in its C-terminal region. 

17. An isolated protein according to claim 16 wherein the protein further comprises a 
protein:molecule interacting region. 

18. An isolated protein according to claim 17 wherein the protein:molecule interacting region 
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is located in a region N-terminal of the SOCS box. 

19. An isolated protein according to claim 16 or 17 wherein the protein:molecule interacting 
region is a protein:DNA binding region or a protein:protein binding region. 

20. An isolated protein according to claim 19 wherein the proteinrmolecule interacting region 
is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 

21. An isolated protein according to any one of claims 16-20 wherein the SOCS box 
comprises the amino acid sequence: 

X, X3 X, X, X, X, X, X, X,o X„ X,2 X,3 X,4 X,3 X,, [XJ„ X,, X.« X,, X,o 

X21 X22 X23 [Xj]n X24 X25 X26 X27X2g 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4 is L, I, V, M, A or P; 

X5 is any amino acid; 

Xg is any amino acid; 

Xy isL, I, V, M, A, F, YorW; 

Xg is C, T or S; 

X9 is R, K or H; 

X,o is any amino acid; 

X|, is any amino acid; 

X,2 is L, I, V, M. A or P; 

X,3 is any amino acid; 

X,4 is any amino acid; 

X]5 is any amino acid; 

X,6 is L, I, V, M, A, P, G, C. T or S; 

[Xi]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
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and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X^is L, I, V, M, AorP; 

X,8 is any amino acid; 

X,9 is any amino acid; 

X20 L, I, V, M, A or P; 

X2, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xj]n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L. I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M, A or P. 

22. An isolated protein according to claim 21 wherein the protein modulates signal 
transduction. 

23. An isolated protein according to claim 22 wherein the signal transduction is modulated 
by a cytokine or other endogenous molecule, a hormone, a microbe or a microbial product, a 
parasite, an antigen or other effector molecule. 

24. An isolated protein according to claim 23 wherein the protein modulates cytokine- 
mediated signal transduction. 

25. An isolated protein according to claim 24 wherein the signal transduction is mediated 
by one or more of the cytokines EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, 
IL-6, LIF, IL-12, IFNy, TNFa, EL-l and/or M-CSF. 
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26. An isolated protein according to claim 25 wherein the signal transduction is mediated by 
one or more of IL-6, LIF, OSM, IFN-y and/or thrombopoietin. 

27. An isolated protein according to claim 26 wherein the signal transduction is mediated by 
rL-6. 

28. An isolated protein according to claim 16 wherein said protein comprises an amino acid 
sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 8, SEQ ID 
NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ ID NO. 25, 
SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 46 or SEQ 
ID NO. 48 or an amino acid sequence having at least about 15% similarity to all or part of the 
listed sequences. 

29. An isolated protein according to claim 16 wherein the said protein is encoded by a 
nucleotide sequence substantially as set forth in SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, 
SEQ ID NO. 9, SEQ ID NO. 1 1, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID 
NO. 17, SEQ ID NO. 20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, 
SEQ ID NO. 27, SEQ ID NO. 28, SEQ ID NO. 30. SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID 
NO. 33, SEQ ID NO. 34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38. SEQ ID NO. 39, 
SEQ ID NO. 40. SEQ ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47 or a 
nucleotide sequence having at least 15% similarity to all or a part of the listed sequences or a 
nucleotide sequence capable of hybridizing to the listed sequences under low stringency 
conditions at 42 °C. 

30. An isolated protein or a derivative, homologue, analogue or mimetic thereof having the 
following characteristics: 

(i) comprises a SOCS box in its C-terminal region wherein said SOCS box comprises 
the amino acid sequence: 

X, X, X3 X, X, X, X, X« X, X,o X., X„ X,3 X„ X„ X„ [XJ„ x„ x,3 x„ x,o 
X2, X22 X23 [Xj]„ X24 X25 Xjg X27X28 
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wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4 is L, I, V, M, A or P; 

X5 is any amino acid; 

Xg is any amino acid; 

X7is L, I, V, M, A, F, YorW; 

Xg is C, T or S; 

Xg is R, K or H; 

Xjo is any amino acid; 

X,i is any amino acid; 

X,2 is L, I, V, M, A or P; 

X,3 is any amino acid; 

X,4 is any amino acid; 



X 



15 



is any amino acid; 



X,6 is L, I, V, M, A, P, G, C, T or S; 

[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xnis L, I, V, M, AorP; 
Xjg is any amino acid; 
X,9 is any amino acid; 
X20 L, I, V. M, A or P; 
Xj, is P; 

X22 is L, I, V, M. A, P or G; 
X23 is P or N; 

[XjJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X24 is L, I, V, M, A or P; 
X23 is any amino acid; 
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is any amino acid; 
X27 is Y or F; 

Xjg is L, I, V, M, A or P; and 

(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats 
or other proteinrmolecule interacting domain in a region N-terminal of the SOCS box; 
and 

(iii) modulates signal transduction. 

31. A method of modulating levels of a SOCS protein in a cell said method comprising 
contacting a cell containing a SOCS gene with an effective amount of a modulator of SOCS gene 
expression or SOCS protein activity for a time and under conditions sufficient to modulate levels 
of said SOCS protein. 

32. A method of modulating signal transduction in a cell containing a SOCS gene comprising 
contacting said cell with an effective amount of a modulator of SOCS gene expression or SOCS 
protein activity for a time sufficient to modulate signal transduction. 

33 . A method of influencing interaction between cells wherein at least one cell carries a SOCS 
gene, said method comprising contacting the cell carrying the SOCS gene with an effective 
amount of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient 
to modulate signal transduction. 

34. A method according to any one of claims 31-33 wherein signal transduction is mediated 
by a cytokine, a hormone, a microbe or a microbial product, a parasite, an antigen or other 
effector molecule. 

35. A method according to claim 34 wherein the cytokine is one or more of EPO, TPO, G- 
CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M- 
CSF. 
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36. A method according to claim 35 wherein the cytokine is one or more of IL-6, LIF, OSM, 
IFN-y and/or thrombopoietin. 

37. A method according to claim 36 wherein the cytokine is IL-6. 

38. A method according to any one of claims 31-37 wherein the SOCS gene encodes a 
protein having a SOCS box comprising the amino acid sequence: 

^1 -^2 ^3 -^4 ^5 ^6 ^7 ^8 ^9 ^10 -^11 ^12 -^13 ^14^15 ^16 [^Jn ^17 ^18 ^19 ^20 
X2, X22 X23 [Xj]„ X24 X25 X26 X27X28 



wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4 is L, I, V, M, A or P; 

Xj is any amino acid; 

X5 is any amino acid; 

Xj is L, I, V, M, A, F, Y or W; 

Xg is C, T or S; 

X, is R, K or H; 



X 



10 



12 



s any amino acid; 
s any amino acid; 
s L, I, V, M, A or P; 
s any amino acid; 
s any amino acid; 
s any amino acid; 
s L, I, V, M, A, P, G, C, T or S; 
[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xi^is L. I, V, M, AorP; 
X,g is any amino acid; 
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X,9 is any amino acid; 
X20L. I, V, M.AorP; 
X2, isP; 

is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xjln is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M. A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M, A or P. 

39. A method according to claim 38 wherein the SOCS gene comprises a nucleotide 
sequence selected from SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, SEQ ID NO. 9, SEQ 
ID NO. 11. SEQ ID NO. 13. SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID NO. 17, SEQ ID NO. 
20, SEQ ID NO. 22. SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ 
ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 3 1 , SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 
34. SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ ID NO. 40, SEQ 
ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47. 

40. A method according to claim 38 wherein the SOCS gene encodes a protein comprising 
an amino acid sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 
8, SEQ ID NO. 10, SEQ ID NO. 12. SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ 
ID NO. 25, SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41 , SEQ ID NO. 44. SEQ ID NO. 
46 or SEQ ID NO. 48. 



SUBSnniTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



2/126 




C 

o 
o 



CM 
< 



pes 
IHUJeg 

pes 
IHUieg 



^ 



o 

0) 



OJ 



I 

o. 
rf 

O) 



I I 
CLQ. 

CM(D 



I 

o. 

CO 



SUBSnrUTE sheet (Rule 26) 



wo 98/20023 



PCT/AU97/00729 



3/126 



u 
o 



D. 
Xi 



1 



2 
a. 



CN 

s 

OS 



t 



2 

OS 
a. 



< 
m 



SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



4/126 



-159 cgaggctcaagctccgggcggattctgcgtgccgctctcg 

-12 0 ctccttggggtctgttggccggcctgtgccacccggacgcccggctcactgcctctgtct 

-60 cccccatcagcgcagccccggacgctatggcccacccctccagctggcccctcgagtagg 

1 MVARNQVAADNAISPAAEPR 

1 ATGGTAGCACGCAACCAGGTGGCAGCCGACAATGCGATCTCCCCGGCAGCAGAGCCCCGA 

21 RRSEPSSSSSSSSPAAPVRP 

61 CGGCGGTCAGAGCCCTCCTCGTCCTCGTCTTCGTCCTCGCCAGCGGCCCCCGTGCGTCCC 

41 RPCPAVPAPAPGDTHFRTFR 

121 CGGCCCTGCCCGGCGGTCCCAGCCCCAGCCCCTGGCGACACTCACTTCCGCACCTTCCGC 

61 SHSDYRRITRTSALLDACGF 

181 TCCCACTCCGATTACCGGCGCATCACGCGGACCAGCGCGCTCCTGGACGCCTGCGGCTTC 

81 YWGPLSVHGAHERLRAEPVG 

241 TATTGGGGACCCCTGAGCGTGCACGGGGCGCACGAGCGGCTGCGTGCCGAGCCCGTGGGC 

101 TFLVRDSRQRNCFFALSVKM 

3 01 ACCTTCTTGGTGCGCGACAGTCGTCAACGGAACTGCTTCTTCGCGCTCAGCGTGAAGATG 

121 ASGPTS IRVHFQAGRFHLDG 

361 GCTTCGGGCCCCACGAGCATCCGCGTGCACTTCCAGGCCGGCCGCTTCCACTTGGACGGC 

141 SRETFDCLFELLEHYVAAPR 

421 AGCCGCGAGACCTTCGACTGCCTTTTCGAGCTGCTGGAGCAcTACGTGGCGGCGCCGCGC 

161 RMLGAPLRQRRVRPLQELCR 

481 CGCATGTTGGGGGCCCcGCTGCGCCAGCGCCGCGTGCGGCCGCTGCAGGAGCTGTGTCGC 

181 QRIVAAVGRENLARIPLNPV 

54 1 CAGCGCATCGTGGCCGCCGTGGGTCGCGAGAACCTGGCGCGCATCCcTCTTAACCCGGTA 

201 LRDYLSSFPFQI* 

601 CTCCGTGACTACCTGAGTTCCTTCCCCTTCCAGATCtgaccggctgccgctgtgccgcag 

6 61 cattaagtgggggcgccttattatttcttattattaattattattatttttctggaacca 
721 cgtgggagccctccccgcctgggtcggagggagtggttgtggagggtgagatgcctccca 

7 81 cttctggctggagacctcatcccacctctcaggggtgggggtgctcccctcctggtgctc 
841 cctccgggtcccccctggttgtagcagcttgtgtctggggccaggacctgaattccactc 
901 ctacctctccatgtttacatattcccagtatctttgcacaaaccaggggtcggggagggt 
961 ctctggcttcatttttctgctgtgcagaatatcctattttatatttttacagccagttta 
1021 oat aataaa ctttattataaaaatttttttttaaaaqaaaaaaaaaaaaaaaaaa 



FIG 3B 
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FIG 7A 
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FIG7B 
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FIG 9(1) 



FIG 9(11) 



FIG 9 (III) 



FIG 9 
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FIG 49 
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